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Abstract 

A new method for analyzing low density parity check (LDPC) codes and low density gen- 
erator matrix (LDGM) codes under bit maximum a posteriori probability (MAP) decoding is 
introduced. The method is based on a rigorous approach to spin glasses developed by Francesco 
Guerra. It allows to construct lower bounds on the entropy of the transmitted message condi- 
tional to the received one. Based on heuristic statistical mechanics calculations, we conjecture 
such bounds to be tight. The result holds for standard irregular ensembles when used over 
binary input output symmetric channels. 

The method is first developed for Tanner graph ensembles with Poisson left degree distri- 
bution. It is then generalized to 'multi-Poisson' graphs, and, by a completion procedure, to 
arbitrary degree distribution. 
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1 Introduction 



Codes based on random graphs are of huge practical and theoretical relevance. The analysis 
of such communication schemes is currently in a mixed status. From a practical point of 
view, the most relevant issue is the analysis of linear-time decoding algorithms. As far as 
message-passing algorithms are concerned, our understanding is rather satisfactory. Density 
evolution [1-4] allows to compute exact thresholds for vanishing bit error probability, at least 
in the large blocklength limit. These results have been successfully employed for designing 
capacity-approaching code ensembles [5,6]. 

A more classical problem is the evaluation of decoding schemes which are optimal with re- 
spect to some fidelity criterion, such as word MAP (minimizing the block error probability) or 
symbol MAP decoding (minimizing the bit error probability). Presently, this issue has smaller 
practical relevance than the previous one, and nonetheless its theoretical interest is great. Any 
progress in this direction would improve our understanding of the effectiveness of belief propa- 
gation (and similar message-passing algorithms) in general inference problems. Unhappily, the 
status of this research area [7, 10] is not as advanced as the analysis of iterative techniques. In 
most cases one is able to provide only upper and lower bounds on the thresholds for vanishing 
bit error probability in the large blocklength limit. Moreover, the existing techniques seems 
completely unrelated from the ones employed in the analysis of iterative decoding. This is 
puzzling. We know that, at least for some code constructions and some channel models, belief 
propagation has performances which are close to optimal. The same has been observed empir- 
ically in inference problems. Such a convergence in behavior hints at a desirable convergence 
in the analysis techniques. 

This paper aims at bridging this gap. We introduce a new technique which allows to derive 
lower bounds on the entropy of the transmitted message conditional to the received one. We 
conjecture that the lower bound provided by this approach is indeed tight. Interestingly 
enough, the basic objects involved in the new bounds are probability densities over R, as in 
the density evolution analysis of iterative decoding. These densities are required moreover to 
satisfy the same 'symmetry' condition (see Sec. 0]for a definition) as the messages distributions 
in density evolution. The bound can be optimized with respect to the densities. A necessary 
condition for the densities to be optimal is that they correspond to a fixed point of density 
evolution for belief propagation decoding. 

The method presented in this paper is based on recent developments in the rigorous theory 
of mean field spin glasses. Mean field spin glasses are theoretical models for disordered mag- 
netic alloys, displaying an extremely rich probabilistic structure [12, 13]. As shown by Nicolas 
Sourlas [14-16], there exists a precise mathematical correspondence between such models and 
error correcting codes. Exploiting this correspondence, a number of heuristic techniques from 
statistical mechanics have been applied to the analysis of coding systems, including LDPC 
codes [17-20] and turbo codes [21,22]. Unhappily, the results obtained through this approach 
were non-rigorous, although, most of the times, they were expected to be exact. 

Recently, Francesco Guerra and Fabio Lucio Toninelli [23, 24] succeeded in developing a 
general technique for constructing bounds on the free energy of mean field spin glasses. The 
technique, initially applied to the Sherrington-Kirkpatrick model, was later extended by Franz 
and Leone [25] to deal with Ising systems on random graph with Poisson distributed degrees. 
Finally, Franz, Leone and Toninelli [26] adapted it to systems on graphs with general degree 
distributions. This paper adds two improvements to this line of research. It generalizes it to 
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Ising systems with (some classes of) biased coupling distributions 1 . Furthermore, it introduces 
a new way of dealing with general degree distributions which (in our view) is considerably 
simpler than the approach of Ref. [26]. Using the new technique, we are able to prove that 
the asymptotic expression for the conditional entropy of irregular LDPC ensembles derived 
in [20] is indeed a lower bound on the real conditional entropy. This gives further credit to the 
expectation that the results of [20] are exact, and we formalize this expectation as a conjecture 
in Sec. [TUJ 

The new technique is based upon an interpolation procedure which progressively eliminates 
right (parity check) nodes from the Tanner graph. This procedure is considerably simpler 
for graph ensembles with Poisson left (variable) degree distribution. Such graph can be in 
fact constructed by adding a uniformly random right node at a time, independently from 
the others. We shall therefore adopt a three steps strategy. We first prove our bound for 
Poisson ensembles. This allows to explain the important ideas of the interpolation technique 
in the simplest possible context. Unhappily Poisson ensembles may have quite bad error-floor 
properties due to low degree node and are not very interesting for practical purposes 2 . Next, 
we generalize the bound to 'multi-Poisson' ensembles. These can be constructed by a sequence 
of rounds such that, within each round, right nodes are added independently of each other. In 
other words multi-Poisson graphs are obtained as the superposition of several Poisson graphs. 
Finally, we show that a general degree distribution can be approximated arbitrarily well using 
a 'multi-Poisson' construction. Together with continuity of the bound, this implies our general 
result. 

In Section El we introduce the code ensembles to be considered. Symbol-MAP decoding 
scheme is defined in Sec. together with some basic probabilistic notations. Section 0] collects 
some remarks on symmetric random variables to be used in the proof of our main results. We 
then prove that the per-bit conditional entropy of the transmitted message concentrates in 
probability with respect to the code realization. This serves as a justification for considering 
its ensemble average. Our main result, i.e. a lower bound on the average conditional entropy 
is stated in Sec. H3 This Section also contains the proof for Poisson ensembles. The proof for 
multi-Poisson and standard ensembles is provided (respectively) in Sections [7] and |HJ Section^ 
presents several applications of the new bound together with a general strategy for optimizing 
it. Finally, we draw our conclusion and discuss extensions of our work in Section ITU1 Several 
technical calculations are deferred to the Appendices. 

2 Code ensembles 

In this Section we define the code ensembles to be analyzed in the rest of the paper. By 
'standard ensembles' we refer to the irregular ensembles considered, e.g., in Refs. [2,3]. Poisson 
ensembles are characterized by Poisson left degree distribution. Finally multi-Poisson codes 
can be thought as 'combinations' of Poisson codes, and are mainly a theoretical device for 
approximating standard ensembles. 

For each of the three families, we shall proceed by introducing a family of Tanner graph 
ensembles. In order to specify a Tanner graph, we need to exhibit a set of left (variable) 
nodes V, of size |V| = n, a set of right (check) nodes C, with \C\ = m and a set of edges £, 

1 The reader unfamiliar with the statistical physics jargon may skip this statement. 

2 One exception to this statement is provided by Luby Transform codes [27]. These can be regarded as Poisson 
ensembles, and, due to the large average right degree, have an arbitrary small error floor 
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each edge joining a left and a right node. If i £ V denotes a generic left node and a £ C a 
generic right node, an edge joining them will be given as (i, a) £ £. Multiple edges are allowed 
(although only their parity matters for the code definition). Furthermore, two graphs obtained 
through a permutation of the variable or of the check nodes are regarded as distinct (nodes 
are 'labeled'). The neighborhood of the variable (check) node i £ V (a £ C) is denoted by di 
(da). In formulae di = {a £ C : (i,a) £ £} and da = {i £ V : (i,a) £ £}. 

A Tanner graph ensemble will be generically indicated as (•••), where '• • •' is a set of relevant 
parameters. Expectation with respect to a Tanner graph ensemble will be denoted by Eg. 

Next, we define LDPC(- • •) and LDGM(- • •) codes as the LDPC and LDGM codes associated 
to a random Tanner graph from the (• • •) ensemble. Since this construction does not depend 
upon the particular family of Tanner graphs to be considered, we formalize it here. 

Definition 1 Let M = {H a i : a £ C; i £ V} be the adjacency matrix of a Tanner graph from the 
ensemble (• • •): H a i = 1 if (i, a) appears in £ an odd number of times, and H a i = otherwise. 
Then 

1. A code from the LDGM(- • •) ensemble is the linear code on GF[2] having HI as generator 
matrix. The design rate of this ensemble is defined as r^ cs = n/Kgm. 

2. A code from the LDPC(- • •) ensemble is the linear code on CF[2] having EI as parity 
check matrix. The design rate of this ensemble is r& es = 1 — Egm/n. 

Before actual definitions of graph ensembles, it is convenient to introduce some notations 
for describing them. For a given graph we define the degree profile (A, P) as a couple of 
polynomials 

^max ^max 

k{x) = Y J ^x l , P(x) = Y,hx k , (2.1) 

1=2 k=2 

such that Aj (P\) is the fraction of left (right) nodes of degree i. The degree profile (A, P), will 
be in general a random variable depending on the particular graph realization. On the other 
hand, each ensemble will be assigned a non-random 'design degree sequence' (A, P). This is 
the degree profile that the ensemble is designed to achieve (and in some cases achieves with 
probability approaching one in the large blocklength limit). Both A(x) and P(x) will have 
non- negative coefficients and satisfy the normalization condition A(l) = P(l) = 1. Finally, it 
is useful to introduce the 'edge perspective' degree sequences: X(x) = Y^i = A'(x)/A'(l), 

and p{x) = J2 k PkX k ~ l = P'(x)/P'{l). 

In the following Sections, 'with high probability' (w.h.p.) and similar expressions will refer 
to the large blocklength limit, with the other code parameters kept fixed. 

2.1 Standard ensembles 

Standard ensembles are discussed in several papers [2-5,29,30]. Their performances under 
iterative decoding have been thoroughly investigated allowing for ensemble optimization [2,6]. 
A standard ensemble of factor graphs is defined by assigning the blocklength n and the design 
degree sequence (A,P). We shall assume the maximum left and right degrees Z max and fc max 
to be finite. 

Definition 2 A graph from the standard ensemble (n, A, P) includes n left nodes and m = 
nA'(l)/P'(l) right nodes (i.e. V = [n] and C = [m]), and is constructed as follows. Partition 
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the set of left nodes uniformly at random into £ max subsets {Vi} with \Vi\ = nA/. For any 
1 = 2,... , Z maX ; associate I 'sockets' to each i e Vj. Analogously, partition the right nodes into 
sets {Ck} with \Ck\ = nPk, and associate k sockets to the nodes in Ck- Notice that the total 
number of sockets on each side is nA'(l). 

Choose a uniformly random permutation over nA'(l) objects and connect the sockets ac- 
cordingly (two connected sockets form an edge). 

The ensemble (n, A, P) is non-empty only if the numbers nA'(l)/P'(l) and {nAi,mPk} are 
integers. The design rate of the LDPC(n, A, P) ensemble is r^ cs = 1 — A'(l)/P'(l), while for 
the LDGM(n, A, P) ensemble r des = P'(l)/A'(l). It is clear that the degree profile (A,P) 
concentrates around the design degree sequences (A, P) 

An equivalent construction of a graph in the standard ensemble is the following. As before, 
we shall partition nodes and associate them sockets. Furthermore we shall keep track of the 
number of 'free' sockets at variable node i after t steps in the procedure, through an integer 
di(t). Therefore, at the beginning set dj(0) = I for any i G V/. Next, for any a = 1, . . . , m 
consider the a-th check node, and assume that a £ For r = 1, ...,k do the following 
operations: (i) choose i® in V with probability distribution W{(t) = dj(i)/Q^- (t) ) ; (ii) Set 
dia(t + 1) = dia(t) — 1 and dj(i + 1) = dj(£) for any i / z"; (Hi) increment t by 1. Finally a is 
connected to if,...^. The graph obtained after the last right node a = m is connected to 
the left side, is distributed according to the standard ensemble as defined above. 

2.2 Poisson ensembles 

A Poisson ensemble is specified by the blocklength n, a real number 7 > 0, and a right degree 
design sequence P(x). Again, we require the maximum right degree k max to be finite. 

Definition 3 A Tanner graph from the Poisson ensemble (n, 7, P) is constructed as follows. 
The graph has n variable nodes i G V = [n]. For any 2 < k < k max , choose mk from a 
Poisson distribution with parameter wyP^/ P 1 {1). The graph has m = check nodes 

a € C = {(a, k) : a € [m k ] , 2 < k < k max }. 

For each parity check node a = (a, k) choose k variable nodes if , . . . , i\ uniformly at random 
in V, and connect a with i\, . . . , 1%. 

A few remarks are in order: they are understood to hold in the large blocklength limit n — > 00 
with 7 and P fixed, (i) The number of check nodes is a Poisson random variable with mean 
Egm = n~f/P'(l). Moreover, m concentrates in probability around its expectation, (ii) The 
right degree profile P concentrates around its expectation EgP^ = P^ + 0(l/n). (Hi) The 
left degree profile A has expectation EgA; = 7'e~ 7 //! + 0(l/n), and concentrates around its 
average. In view of these remarks, we define the design left degree sequence A of a Poisson 
ensemble to be given by A(x) = e 7 ^ -1 ). 

The design rate 3 of the LDGM(n, 7, P) ensemble is r dcs = P'(l)/j, while, for a LDPC(n, 7, P) 
ensemble, r^es = 1 — 7/P'(l) = 1 — Egm/n. 

3 In the first case, the actual rate r is equal to n/m, and therefore, because of the observation (i) above, concentrates 
around the design rate. In the LDPC(n, 7, P) ensemble, the actual rate r is always larger or equal to 1 — m/n, because 
the rank of the parity check matrix H is not larger than m. Notice that 1 — m/n concentrates around the design rate. 
It is not hard to show that the actual rate is, in fact, strictly larger than rdcs with high probability. A lower bound 
on the number of codewords A^(M) can in fact be obtained by counting all the codewords such that Xi — for all the 
variable nodes i with degree larger than 0. This implies r > Aq. Take 7 and P(x) such that e~ 7 > 1 — j/P'(l) + 5 
for some 6 > (this can be always done by chosing 7 large enough) . Since Ao is closely concentrated around e -7 in 
the n — > 00 limit, we have r > rd cs + S with high probability. 
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The important simplification arising for Poisson ensembles is that their rate can be easily 
changed in a continuous way. Consider for instance the problem of sampling a graph from the 
(n, 7 + A7, P) ensemble. To the first order in A7 this can be done as follows. First generate 
a graph from the (n,7, P). Then, for each k G {3, . . . , /c max }, add a check node of degree k 
with probability A7 Pk/P'(l), and connect it to k uniformly random variable nodes i±, . . . ,ik- 
Technically, this property will allow to compute derivatives with respect to the code rate, cf. 
App.0 

2.3 Multi-Poisson ensembles 

We introduce multi-Poisson ensembles them in order to 'approximate' graphs in standard 
ensembles as the union of several Poisson sub-graphs. The construction proceeds by a finite 
number of rounds. During each round, we add a certain number of right nodes to the graph. 
The adjacent left nodes are drawn independently using a biased distribution. The bias drives 
the procedure towards the design left degree distribution, and is most effective in the limit of a 
large number of 'small' stages. A multi-Poisson ensemble is fully specified by the blocklength 
n, a design degree sequence (A, P) (with A(x) and P{x) having maximum degree, respectively, 
I mm , and /c max ), and a real number 7 > describing the number of checks to be added at 
each round. The number of rounds is defined to be i max = [A'(l)/7j - 1. Below we adopt the 
notation = x if x > and = otherwise. 

Definition 4 A Tanner graph Q from the multi-Poisson ensemble (n, A, P, 7) is defined by 
the following procedure. The graph has n variable nodes i G V = [n], which are partitioned 
uniformly at random into £ max subsets {Vi}, with [V/| = nA[. For each 1 = 2,..., Z max and each 
i 6 V|, let dj(0) = I. Let Go be the graph with variable nodes V and without any check node. 
We shall define a sequence Q ,. . . , Gt mBX ? and set Q = Gt mB *- 

For any t = 0, . . . , i max — 1; Gt+i is constructed from Gt as follows. For any 2 < k < k max , 
choose ml from a Poisson distribution with parameter n^P^j P' '(1). Add = 'Yl,k m ^k 
check nodes to Gt and denote them by Ct = {(a,k,t) : a € [m k ' ], 2 < k < /c max }. For each 
node a = (a, k,t) € Ct, choose k variable nodes £",... independently in V, with distribution 
Wi(t) = [di(i)]+/(X)j[dj (£)]+), an d connect a with ...,£|. Finally, set dj(i+ 1) = dj(i) — 
Ai(t), where Aj(i) is the number of times node i has been chosen during round t. 

Notice that the above procedure may fail if ^j[ c 'i(^)]+ vanishes for some t < t max — 1. However, 
it is easy to understand that, in the large blocklength limit, the procedure will succeed with 
high probability (see the proof of Lemma ^ below). 

The motivation for the above definition is that, as 7 — > at n fixed, it reproduces the 
definition of standard ensembles (see the formulation at the end of Sec. I2.1j) . At non-zero 7, 
the multi-Poisson ensemble differ from the standard one in that the probabilities Wi(t) are 
changed only every about n'j edge connections. On the other hand, we shall be able to analyze 
multi-Poisson ensembles in the asymptotic limit n — > 00 with the other parameters -the design 
distributions (A, P) and the stage 'step-size' 7- kept fixed. It is therefore crucial to estimate 
the 'distance' between multi-Poisson and standard ensembles for large n at fixed 7. 

We formalize this idea using a coupling argument. Let us recall here that, given two random 
variables X S X and Y £ y, a, coupling among them is a random variable (A', Y') € X x y such 
that the marginal distributions of X' and Y' are the same as (respectively) those of X and Y . 
Furthermore, we define a 'rewiring' as the elementary operation of either adding or removing 
a function node from a Tanner graph. The following Lemma is proved in Appendix lAl 
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Lemma 1 Let < 7 < 1 and (A, P) be a degree sequence pair. Then there exist two n- 
independent positive numbers A(A, P), b(A, P) > and a coupling (Q s ,Q m p), between the stan- 
dard ensemble (n, A, P) and the multi-Poisson ensemble (n, A, P, 7), such that w.h.p. Q s is 
obtained from GmP with a number of rewirings smaller than A(A, P)n^ b ^ ,p \ 

In other words, we can obtain a random Tanner graph from the standard ensemble by first 
generating a multi-Poisson graph with the same design degree sequences and small 7, and then 
changing a small fraction of edges. 

Although it is convenient to define multi-Poisson ensembles in terms of the design degree 
distribution A(x), such a distribution does not coincide with the actual degree profile achieved 
by the above construction, even in the large blocklength limit. In order to clarify this statement, 
let us define the expected degree distribution A <yn ''~ 1 \x) = EcA(x) for blocklength n and step 
parameter 7. Furthermore, let 4 A™ (a;) = liuu^oo A( n ' 7 )(:r). We claim that, in general, 
AW(ar) ^ A(x). This can be verified explicitly using the characterization of A^ 7 ^ (x) provided 
in AppendixEl However, as a consequence of Lemma^ for small 7, A^\x) is 'close' to A(x). 



Corollary 1 Let < 7 < 1 and (A, P) be a degree sequence pair. Then there exist two n- 
independent positive numbers A(A, P), 6(A, P) > such that |A| 7 ^ — A;| < A(A, P)n^ b ^' P ^ for 
each I £ {2, ... , Z max }- Moreover 

lim ||AW -A|| =0, where ||A^ - A|| = - E |A{ 7) - A,| . (2.2) 

The distance ||/i — v\\ defined above is often called the 'total variation distance': we refer to 
App.[A]for some properties. 

Proof: Let (0 S , Gmp) be a pair of Tanner graphs distributed as in the coupling of Lemma 
^ Their marginal distributions are, respectively, the standard (n, A, P) and the multi-Poisson 
(n, A,P, 7) ones. Denote by Ai{Q.) the fraction of degree-/ variable nodes in graph Q.. Then, 
there exist A and b such that 

lim P[3is.t. - Ai{g mP )\ > A 7 b ] =0. (2.3) 

n— >oo 

This follows from Lemma^together with the fact that each rewiring induces a change bounded 
by A; max /n in the degree profile. Therefore, using the notation A| = EA;((/ S ) for the expected 
degree profile in the standard ensemble at finite blocklength, we get 

| A (7,n) _ A (n)| = | E[A ^ s) _ Al (g mP )}\ < E|A,(&) " Ai(g mP )\ < V + On(l) • (2-4) 

The first thesis follows by taking the n — > 00 limit. 

Convergence in total variation distance follows immediately: 

iia w -aii = ^Ei A ! 7) - A ^^Ei A ! 7) - A '! + ^E A ' + ^E A S 7) = 

l KU 1>U 1>U 

KU KU 1>U 



4 In App. [B] wc shall prove the existence (in an appropriate sense) of this limit and provide an efficient way of 
computing A^(x). 
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The last expression can be made arbitrarily small by chosing 7 and Z* appropriately. For 
instance, one can choose Z* in such a way that YIki ^ — ~~ e > an< ^ then 7 such that 
A^ b < e/Z*. This implies ||AW - A|| < 3e. □ 



3 Decoding Schemes 

In the LDGM case the codeword bits are naturally associated to the check nodes. A string 
x = { x a '■ a G C} G {0, l} m is a codeword if and only if there exists an information message 
x = {xi : i G V} G {0, l} n such that 

£ a = £jj © • • • © , (3.1) 

for each a = (a,k) G C. Here © denotes the sum modulo 2. Encoding consists in choosing 
an information message x with uniform probability distribution and constructing the corre- 
sponding codeword x using the equations (|3.1|) . Notice that, because the code is linear, each 
codeword is the image of the same number of information messages. Therefore choosing an 
information message uniformly at random probability is equivalent to choosing a codeword 
uniformly at random. 

In the LDPC case the codeword bits can be associated to variable nodes. A string x = 
{xi : i G V} G {0, l} n is a codeword if and only if it satisfies the parity check equations 

Xi* © • • • © Xia = , (3.2) 

for each a = (a, k) G C. In the encoding process we pick a codeword with uniform distribution. 

The codeword, chosen according to the above encoding process, is transmitted on a binary- 
input output symmetric channel (BIOS) with output alphabet A and transition probability 
density Q(y\x). In the following we shall use a discrete notation for A. It is straightforward 
to adapt the formulas below to the continuous case. If, for instance A = M, sums should be 
replaced by Lebesgue integrals: J2 y eA ' ~* Jdy 

The channel output has the form y = {y a : a G C} G A m in the LDGM case or y = {yi : i G 
V} G A n LDPC case. In order to keep unified notation for the two cases, we shall introduce 
a simple convention which introduces a fictitious output y, associated to the variable nodes, 
(in the LDGM case) or y, associated to the check nodes (in the LDPC case). If an LDGM is 
used, y takes by definition a standard value y =(*,...,*), while of course y is determined by 
the transmitted codeword and the channel realization. The character * should be thought as 
an erasure. If we are considering a LDPC, y is the channel output, while y takes a standard 
value y = (0, . . . ,0). 

We will focus on the probability distribution P(x\y,y) of the vector x, conditional to the 
channel output (y,y). Depending upon the family of codes employed (whether LDGM or 
LDPC), this distribution has different meanings. It is the distribution of the information 
message in the LDGM case, and the distribution of the transmitted codeword in the LDPC 
case. It can be always written in the form 

The precise form of the functions Qe(-|\) and Qy(-\-) depends upon the family of codes: 
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Figure 1: Factor graph representation of the probability distribution of the transmitted codeword (infor- 
mation message for LDGM codes) conditional to the channel output, cf. Eq. (|3,3|) . Notice that, for the 
sake of compactness the y arguments have been omitted. 



For LDGM's: 

<tettl*)-fl«l*). Qvfo|x) = {J ' (3 ' 4) 

For LDPC's 

= { I ottawt • «vfe|x) = g (9 |x). (3.5) 

The probability distribution (|3,3|) can be conveniently represented in the factor graphs language 
[31], see Fig. ^ There are two type of factor nodes in such a graph: nodes corresponding to 
Qy('l') terms on the left, and nodes corresponding to Qc('l') on the right. It is also useful 
to introduce a specific notation for the expectation with respect to the distribution (|3.3j) . If 
F : {0, 1} V — > R is a function of the codeword (information message for LDGM codes), we 
define: 

(F(x)) = ^F(£P(x\y,ij). (3.6) 

X 

In the proof of our main result, see Sec. it will be also useful to consider several i.i.d. copies 
of x, each one having a distribution of the form 1)3.3(1 . If F : {0, 1} V x • • • x {0, 1} V — > R, 

is a real function of g such copies, we denote its expectation 

value by 

(F(gp-\ . . . = ^ F(x«, . . . ,x^) P(x(%,y) ■ ■ ■ P(x iq) \y,y) • (3.7) 

zW...^) 

We are interested in two different decoding schemes: bit MAP (for short MAP hereafter) 
and iterative belief propagation (BP) decoding. In MAP decoding we follow the rule: 

xf 1 Ap = arg max P(xi \y, y) , (3.8) 
where P(x{\y, y) is obtained by marginalizing the probability distribution (|3.3|) over {xj]j / i}. 
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BP decoding [32] is a message passing algorithm. The received message is encoded in terms 
of log-likelihoods {hi : i € V} and {J a : a € C} as follows (notice the unusual normalization): 

The messages {u a ^i,Vi^ a } are updated recursively following the rule 

u a ^i := arctanhjtanh J a tanht^a} , (3.10) 

j£da\i 

Vi^ a := hi+ ^ U b~>i • 
b£di\a 

After iterating Eqs. (|3.1U|) . I|3.11jl a certain fixed number of times, all the incoming messages 
to a certain bit Xi are used to estimate its value. 

In the following, we shall denote by Ec the expectation with respect to one of the code 
ensembles defined in this Section. Which one of the ensembles defined above will be clear from 
the context. We will denote by K y the expectation with respect to the received message (y, y), 
assuming the transmitted one to be the all-zero codeword (or, in some cases, with respect to 
one of the received symbols). 



4 Random variables 

It is convenient to introduce a few notations and simple results for handling some particular 
classes of random variables. Since these random variables will appear several times in the 
following, this introduction will help to keep the presentation more compact. Here, as in the 
other Sections, we shall sometimes use the standard convention of denoting random variables 
by upper case letters (e.g. X) and deterministic values by lower case letters (e.g. x). Moreover, 

we use the symbol = to denote identity in distribution (i.e. X = Y if X and Y have the same 
distribution) . 

The most important class of random variables in the present paper is the following. 

Definition 5 A random variable X is said to be symmetric ( and we write X G S) if it takes 
values in (— oo,+oo] and 

E x [g(x)}=E x [e- 2x g(-x)}, (4.1) 

for any real function g such that at least one of the expectation values exists. 

This definition was already given in [33]. Notice however that the two definitions differ by 
a factor 2 in the normalization of X. The introduction of symmetric random variables is 
motivated by the following observations [33]: 

1. If Q(y\x) is the transition probability of a BIOS and y is distributed according to Q(y\0), 
then the log-likelihood 

is a symmetric random variable. In particular the log-likelihoods {hi : i € V} and 
{ J a : a € C} defined in (|3,9j) are symmetric random variables. 
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2. Two important symmetric variables are: (i) X = with probability 1; (ii) X = +oo with 
probability 1. Both can be considered as particular examples of the previous observation. 
Just take an erasure channel with erasure probability 1 (case i) or (case ii). 

3. If X and Y are symmetric then Z = X + Y is symmetric. Here the sum is extended to 
the domain (— oo, +oo] by imposing the rule x + oo = +oo. 

4. If X and Y are symmetric then Z = arctanh(tanh Atanh Y) is symmetric. The functions 
x i — > tanhx and x *— > arctanhx are extended to the domains (respectively) (— oo,+oo] 
and (— 1,+1] by imposing the rules tanh(+oo) = +1 and arctanh(+l) = +oo. 

As a consequence of the above observations, if the messages {u a -+i, t>i_> a } in BP decoding are 
initialized to some symmetric random variable, remain symmetric at each step of the decoding 
procedure. This follows directly from the update equations ()3,1U|) and ()3.11j) . 

Remarkably, the family of symmetric variables is 'stable' also under MAP decoding. This 
is stated more precisely in the result below [30]. 

Lemma 2 Let P(x\y,y) be the probability distribution of the channel input (information mes- 
sage) x = {x\ . . . x n ), conditional to the channel output y = (y\ . . . y n ), y = . . . y n ) as given 
in Eq. \3. Assume the channel is BIOS and x is the codeword of (is coded using a) linear 
code. Let i = {i\ . . . i^} C [n] and define 

1 P(x h © . .. ®x ik = 0|y) 

^ = 2 log PKe...ex, fc = %) - (43) 

If y is distributed according to the channel, conditional to the all-zero codeword being trans- 
mitted then £i(y) is a symmetric random variable. 

To complete our brief review of properties of symmetric random variables, it is useful to 
collect a few identities to be used several times in the following (throughout the paper we use 
log and log 2 to denote, respectively, natural and base 2 logarithms). 

Lemma 3 Let X be a symmetric random variable. Then the following identities hold 

tanh 2fc X 

E x tanh 2 ^ 1 X = E X tanh 2fc X = E x — = , for k > 1, (4.4) 

1 + tanh X 

°° / 1 1 \ 

E x log(l + tanh A) = £ f — — - — ) E x tanh 2fc A . (4.5) 
fe=l ^ ' 

Proof: The identities (|4.4f) follow from the observation that, because of Eq. (|4.1[) . we 

have 

E g(X) = ±E [g(X) + e - 2X g(-X)} . (4.6) 

The desired result is obtained by substituting either g(X) = tanh 2fc A or g(X) = tanh 2fc_1 A. 
In order to get l|4.5jl . apply the identity (|4.6|) to g{X) = log(l + tanh A) and get 

Elog(l + t) = iE I l^[(l + t)log(l + t) + (l-t)log(l-t)]= (4.7) 

= E y(-i— , (4.8) 

k=i v 7 
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where we introduced the shorthand t = tanhX. At this point you can switch sum and 
expectation because of the monotone convergence theorem. □ 



The space of symmetric random variables is useful because log-likelihoods (for our type of 
problems) naturally belong to this space. It is also useful to have a compact notation for the 
distribution of a binary variable x whose log-likelihood is u € (—00, +00]. We therefore define 

P{x) -{ (i + e- 2 ")-\ if* = 0, 

WJ-j (1 + e -2 U) -l e -2u ifx = L 

5 Concentration 

Our main object of interest will be the entropy per input symbol of the transmitted message 
conditional to the received one (we shall generically measure entropies in bits). We take the 
average of this quantity with respect to the code ensemble to define 

K = -^cH n (X\Y,Y) = (5.1) 
n 

= ^E c ^y\og 2 Z{y,y) - ^ Qc{y\0) log 2 Q c (y|0) 

V ) y 

-^Qv(y|0)log 2 Q v (y|0). (5.2) 

y 

In passing from Eq. (|5.1|) to (|5.2|) . we exploited the symmetry of the channel, and fixed the 
transmitted message to be the all-zero codeword. 

Intuitively speaking, the conditional entropy appearing in Eq. (|5.1[) allows to estimate the 
typical number of inputs with non-negligible probability for a given channel output. The 
most straightforward rigorous justification for looking at the conditional entropy is provided 
by Fano's inequality [34] which we recall here. 

Lemma 4 Let P-q{C) (P^(C)) be the block (bit) error probability for a code C having block- 
length n and rate r. Then 

1. Pb(C) > [H n (X\Y,Y) - l]/(nr). 

2. h(P b (C)) > H n (X\Y,Y)/n. 

Here h(x) = — xlog 2 x — (1 — x) log 2 (l — x) denotes the binary entropy function. 

The rationale for taking the expectation with respect to the code in the definition ()5.1|) is 
the following concentration result 

Theorem 1 Let H n (X\Y_,Y_) be the relative entropy for a code drawn from any of the ensem- 
bles LDGM(- • •) or LDPC(- • •) defined in Section^ when used to communicate over a binary 
memoryless channel. Then there exist two (n-independent) constants A,B > such that, for 
any e > 

W c {\H n (X\Y,Y) - nh n \ > ne} < ie~ n&2 . (5.3) 
Here Pc denotes the probability with respect to the code realization. 
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In particular this result implies that, if h n is bounded away from zero in the n — > oo limit, 
then, with high probability with respect to the code realization, the bit error rate is bounded 
away from 0. The converse (namely that linin^oo h n = implies linin-^oo Pc [Pb (C) > 5] = 0) 
is in general false. However, for many cases of interests (in particular for LDPC ensembles) 
we expect this to be the case. We refer to Sec. EI] for further discussion of this point. 

Proof: We use an Azuma inequality [35] argument similar to the one adopted by 
Richardson and Urbanke to prove concentration under message passing decoding [3]. 

Notice that the code-dependent contribution to H n (X_\Y^,Y_) is K y log 2 Z(y, y), cf. Eq. 
(ET21) . We are therefore led to construct a Doob martingale as follows. First of all fix 7n. ma x — 
(l + 5)Egm, and condition torn < m. max (m being the number of right nodes). A code C in this 
'constrained' ensemble can be thought as a sequence of m max random variables c±, . . . , c mmax . 
The variable q = (k, i) consists in the degree k of the t-th check node, plus the list i = {i\ . . . i^} 
of adjacent variable nodes on the Tanner graph. We adopt the convention ct = * if t > m. Let 
Ct = (ci, . . . , ct) and define the random variables 

X t = Ec[E y log 2 Z(y,y)\C t ,m < m max ] , t = 0, 1, . . . , m max . (5.4) 

It is obvious that the sequence Xq, . . . A^m max form a martingale. In particular M[Xt \ Xq . . . Xt-±] 
Xt—\. In order to apply Azuma inequality we need an upper bound on the differences 
\Xt — Xt-\\- Consider two Tanner graphs which differ in a unique check node and let A 
be a uniform upper bound on the difference in ¥, y log 2 Z(y, y) among these two graphs. Since 
Xf-i is the expectation of Xt with respect to the choice of the t-th check node, \Xt— Xt-\\ < A 
as well. In order to derive such an upper bound A, we shall compare two graphs differing in 
a unique check node, with the graph obtained by erasing this node. 

More precisely, consider a Tanner graph in the ensemble having m check nodes, and channel 
output y, y. Now add a single check node, to be denoted as 0. Let yo be the corresponding 
observed value, and i®. . .i° k the positions of the adjacent bits. In the LDGM case yo will be 
drawn from the channel distribution, while it is fixed to if the code is a LDPC. Evaluate the 
difference of the corresponding partition functions. We claim that 

% E|,log 2 2 m+ ife,po) - ^ y \ogZ m (y,y) =E io M y log 2 (Qe(yo\xi 1 ©■■■©XjJ), (5.5) 

where (•) denotes the expectation with respect to the distribution 1)3. 3 J) for the m-check nodes 
code. In order to prove such a formula, we write explicitely log 2 {Qe(yo\ x ii " " x i k )) using 
Eq. ipOjl . We get 



{ y t t\ ^2 II Qc(Va\xq •••£»») TT Qv(yi\xi) ■ Qeiyolxh ■ ■ -x ik ) > . (5.6) 
^m\y, y)„ w , .^_ r , 
-L a€[m\ i6[nj ) 



logs 



Next we apply the definition of Z m+ i(y, y, yo), and take expectation with respect to y, y and 

ho- 
using the definitions ()4.2|) and 1)4.31) we obtain 

(Qc{fo\x ix © ••• ®x ik )) = - [1 +tanh% ) tanh^(y)] , (5.7) 

with i = (*o • • • *fc)- It follows therefore from Lemma 01 that 

-1 <E yo ^y\og{Qc{m\xh®---®x ik )) <0. (5.8) 
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Therefore the difference in conditional entropy among two Tanner graph which differ in a 
unique check node is at most 2 bits (one bit for removing the check node plus one bit for 
adding it in a different position). Arguing as above, this yields \X t +i — X t \ < 2 and Azuma 
inequality implies 

F c {\H n (X\Y,Y) ~ nh n \ >ne\m< m max } < A x e' nB ^ . (5.9) 

with Ax = 2 and B x = P'(l)/[&y(l + 5)]. 

In the case of standard ensembles, we are done, because the condition m < m max = (1 + 
<5)Egm holds surely (m is a deterministic quantity). For Poisson and multi-Poisson ensembles, 
we have still to show that the event m > m max does not modify significantly the estimate 
(|5.9|) . From elementary theory of Poisson random variables we know that, for any 5 > there 
exist A 2 , B 2 > such that ¥ c [m > m max ] < A 2 e~ nB2&2 . Notice that 

¥ c {\H n (X\Y,Y) - nh n \ > ne} < F c {\H n {X\Y,Y) - nh n \ > ne \ m < m max } +(5.10) 

+F c [m > m max ] < 
< A ie - nBl£2 +A 2 e~ nB2&2 . (5.11) 
The thesis is obtained by choosing 5 = e, B = min(i?i, B 2 ), and A = A\ + A 2 . □ 



6 Main result and proof for Poisson ensembles 

As briefly mentioned in the Introduction, the main ideas in the proof are most clearly described 
in the context of simple Poisson ensembles. We shall therefore discuss them in detail here, 
will be more succinct when using similar arguments for multi-Poisson ensembles. Some of the 
calculations and technical details are deferred to Appendices IU1 and iDl 

In order to state our main result in compact form, it is useful to introduce two infinite 
families {Ua}, {Vb} of i.i.d. random variables. The indices A, and B run over whatever set 
including the cases occurring in the paper. We adopt the convention that any two variables of 
these families carrying distinct subscripts are independent. The distribution of the U and V 
variables shall moreover satisfy a couple of requirements specified in the definition below. 

Definition 6 Fix a degree sequence pair (A, P), and let pk be the edge perspective right degree 

distribution = P^/P'il). Let Vi,V 2 ,... = V be a family of i.i.d. symmetric random 
variables, k an integer with distribution p^, J a symmetric random variable distributed as the 
log-likelihoods {J a } of the parity check bits, cf. Eq. 1.9. .9)) . and 



U = arctanh 



fe-l 



tanh J tanh Vi 



(6.1) 



The random variables U , V are said to be admissible if they are independent, symmetric and 
U = U V . 

For any couple of admissible random variables U, V , we define the associated trial entropy 
as follows 

l 



<MA,P) 



+ 



A'(l) E UtV log 2 
A'(l) 



J2Pn(x)P v (x) 



+ EjEj,E {ui} log 2 



E 



Qv{y\x) 
Qv(v\o) 



+ 



p'(i) 



EfeE^E^.} log 2 



E 

.XI... x k 



Qc(y\xi®---®x k ) 
Qc(y\0) 



J, •; 



(6.2) 



11 



LDGM(X) 



Q 



fyj 



REPEAT 



Geff 



Figure 2: A communication scheme interpolating between an LDGM code and an irregular repetition 
code. Notice that the repetition part is transmitted through a different (effective) channel. 



where I and k are two integer random variables with distribution (respectively) A/ and 
Hereafter we shall drop the reference to the degree distributions in </>y(A, P) whenever this is 
clear from the context. 

Notice that, in the notation for the trial entropy, we put in evidence its dependence just on 
the distribution of the ^/-variables. In fact we shall think of the distribution of the [/-variables 
as being determined by the relation U = U v , see Eq. (|6.1jl . 
The main result of this paper is stated below. 

Theorem 2 Let P(x) be a polynomial with non-negative coefficients such that P(0) = P'{0) = 
0, and assume that P(x) is convex for x G [— xq,Xq\. 

1. Let h n be the expected conditional entropy per bit for either of the Poisson ensembles 
LDGM(n,7,P) or LDPC(n, 7, P). Ifx >l, then 

h n >sup(j) V {A,P). (6.3) 
V 

2. Let h n be the expected conditional entropy per bit for either of the multi-P oisson ensembles 
LDGM(n,7,A,P) or LDPC(n, 7, A, P). If x > e, then 

lim inf h n > su P< /v(A (7) ,P) ■ (6.4) 

n^oo y 

3. Let h n be the expected conditional entropy per bit for either of the standard ensembles 
LDGM(n, A, P) or LDPC(n, A, P). If x > e, then 

lim inf h n > sup </v(A, P) . (6.5) 

n — >oo y 

Here the sup has to be taken over the space of admissible random variables. 

Proof [Poisson ensembles] : Computing the conditional entropy (|5.2|) is difficult because 
the probability distribution (j3.3j) does not factorize over the bits {xt}. Guerra's idea [23] 
consists in interpolating between the original distribution (|3.3j) and a much simpler one which 
factorizes over the bits. In the LDPC case, this corresponds to progressively removing parity 
check conditions (|3.2j) . from the code definition. For LDGM's, it amounts to removing bits from 
the codewords (|3.1|) . In both cases the design rate is augmented. In order to compensate the 
increase in conditional entropy implied by this transformation we imagine to re-transmit some 
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Figure 3: Action of the interpolation procedure on the factor graph representing the probability distribu- 
tion of the channel input (or, in LDGM case, the information message) conditional to the channel output. 
For the sake of simplicity we dropped the arguments (for Q\>), y a (for Qq) and z a -n (for Q c s)- 



of the bits {xi} through an 'effective' channel. It turns out that the transition probability of 
this effective channel can be chosen in such a way to 'match' the transformation of the original 
code. 

In practice, for any s E [0, 7] we define the interpolating system as follows. We construct a 
code from the same ensemble family as the original one, but with modified parameters (n, s, P). 
Notice that both the block length and the right degree distribution remained unchanged. The 
design rate, on the other hand has been increased to rd es (s) = P'(l)/s (in the LDGM case) or 
r des(t) = 1 — s /-f"(l) ( m the- LDPC case). A codeword from this code is transmitted through 
the original channel, with transition matrix Q(-\-). It is useful to denote by C s the set of 
factor nodes for a given value of t. Of course C s is a random variable. 

As anticipated, we must compensate for the rate loss in the above procedure. We there- 
fore repeat each bit Xi l{ times and transmit it through an auxiliary channel with transition 
probability Q c fr( ■ I • )• The ij's are taken to be independent Poisson random variables with pa- 
rameter (7 — s). We can therefore think this construction as a code formed of two parts: a code 
from the LDGM(n, s, P) or LDPC(n,s,P) ensemble, plus an irregular repetition code. Each 
part of the code is transmitted through a different channel. In Fig. |2]we present a schematic 
description of this two-parts coding system (the scheme refers to the LDGM case). 

The received message will have the form (y, y,z), with z = {z ai ^i}, a% € [h]. We can write 
the conditional probability for x = {x\ . . . x n ) to be the transmitted codeword conditional to 
the output (y,y,z) of the channel as follows: 

1 h 
P(x\y,y,z) = — Yl Qc(y a \xii © • • • e x^) Qv(yi\xi) Y[ II Qes( z ai-n\ x i) > ( 6 - 6 ) 

s aeCs i£V i€Vai=l 

with Z s = Z(y, y,z) fixed by the normalization condition of P(x\y, y,z). Notice that, for 5 = 7 
the original distribution ()3.3|) is recovered since li = for any i £ V. On the other hand, if 
s = 0, then m = and the Tanner graph contains no check nodes: Co = 0. We are left with a 
simple irregular repetition code. The action of the interpolation procedure on the factor graph 
is depicted in Fig. 03 

The following steps in the proof will depend upon the effective channel transition probability 
Qcs( ■ I • ) only through its log-likelihood distribution. We therefore define the random variable 



16 



U as 



d 1. Q cS (Z\0) 
C7= 2 log Q^T) 



(6.7) 



Where Z is distributed according to Q e g(z\0). Notice that U is symmetric and that, for any 
symmetric U we can construct at least one BIOS channel whose log-likelihood is distributed 
as U [3]. Hereafter we shall consider Q e s( ■ | • ) to be determined by such a construction. When 
necessary, we shall adopt a discrete notation for the output alphabet of the effective channel 
Qeff ( ■ | • )• However, our derivation is equally valid for a continuous output alphabet. 

It is natural to consider the conditional entropy-per-bit of the interpolating model. With 
an abuse of notation we denote by Eq the expectation both with respect to the code ensemble 
and with respect to the Zj's. We define 

h n {s) = -E c H n (X\Y,Y,Z)= (6.8) 
n 

= - E c E y , z log 2 Z(y, y, z) - V Q c (y|0) log 2 Q c (y\0) - 

n Jr '(1 

\ I y 

- Qv(l/|0) log 2 Qv(y\0) - (7 - a) Qes{z\0) log 2 Q eff (z|0) . (6.9) 



Notice that h n (s) depends implicitly upon the distribution of U. In passing from Eq. (|6.8|) to 
Eq. ()6.9() we used the symmetry condition for the two channels and assumed the transmitted 
message to be the all-zero codeword. 

It is easy to compute the conditional entropy ()6.8|) at the extremes of the interpolation 
interval. For s = 7 we recover the original probability distribution 1|3.3|) and the conditional 
entropy h n . When s = 0, on the other hand, the factor graph no longer contains function nodes 
of degree larger than one and the partition function can be computed explicitly. Therefore we 
have 



Kin) 

h n (0) 



h r . 



Ey^Ei log 2 



E 



Qv{y\x) 



Y[ Qes{z a \x) 



a=l 



(6.10) 

7^QcfrW0)log 2 Q eff (z|0). (6.11) 



where Z is a poissonian variable with parameter 7. As anticipated, h n (0) can be expressed 
uniquely in terms of the distribution of the U variables. Using Eq. (|6.7|) . we get 



Mo) 



E y E,E K} log 2 



v Qy(y\ x ) rr p in 



a=l 



+ 7E u log 2 (l + e' 



-2u\ 



(6.12) 



The next step is also very natural. Since we are interested in estimating h n , we write 

pry 

K = h n (0)+ / — -(s) ds. (6.13) 
Jo ds 

A straightforward calculation yields: 

dh n _ x - P k 1 ^ / Qc(y\x il ®---8gjJ 



ds 



Qc(y|o) 



■-VE 2 E s log 2 



iev 



Qcs(z\xj) 

Q eS (z\0) 



(6.14) 
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where the average (-) s is taken with respect to the interpolating probability distribution (|6.6[) . 
The details of this computation are reported in App. [UJ Of course the right-hand side of Eq. 
H6.14JI is still quite hard to evaluate because the averages (-) s are complicated objects. We shall 
therefore approximate them by much simpler averages under which the x^s are independent 
and have log- likelihoods which are distributed as ^-variables. More precisely we define 



dhn 
dt 



Qc{y\x n e ••• ©x 



P'(l) n k 

k h—i/, 



Qc(v\o) 



3=1 



Pq\ [Xi 



(6.15) 



Notice that in fact this expression does not depend upon s. Summing and subtracting it to 
Eq. (j6,13|) . and after a few algebraic manipulations, we obtain 



K = <f>v + 



dh n . . 



dh Ti 
~dt 



ds 



V + 



R„ (s) ds . 



(6.16) 



The proof is completed by showing that R n (s) > for any s E [0,7]. This calculation is 
reported in App. [DJ 1=1 



7 Proof for multi-Poisson ensembles 

The proof for multi-Poisson ensembles follows the same strategy as for Poisson ensembles. The 
only difference is that the interpolating system is obviously more complex. 

Given the ensemble parameters (n, 7, A, P), and defining as in Sec. 12.31 t max = LA'(l) /7J — 1, 
we introduce an interpolating ensemble for each pair (t*, s), with € {0, . . . , t max — 1} and 
s G [0, 7] as follows. The first t* rounds (i.e. t = 0, . . . , t* — 1 in Definition in the graph 
construction are the same as for the original multi-Poisson graph. 

Next, during round i*, ml** is drawn from a Poisson distribution with mean nsPj./P f (l) 

(instead of wyPk/P'(l)), and right (check) nodes of degree k are added for each k = 

2, . . . , fc max . The neighbors of each of this check node are i.i.d. random variables in V with 
distribution Wi(t) = [di(i*)]+/Q^j[dj (**)] + )• In order to compensate for the smaller number 
of right nodes, an integer is drawn for each j E V from a Poisson distribution with mean 
71(7 — s)wi(t). As in the previous Section, this means that Zj(t*) repetitions of the bit Xi will 
be transmitted through an effective channel with transition probability (Jeff ( • | • ) . Finally, the 
number of free sockets is updated by setting dj(i* + 1) = dj(i*) — Aj(i*) — k(t*), where Aj(i*) 
is the number of times the node i has been chosen as neighbor of a right node in this round. 

During rounds t = + 1, . . . , t max — 1, no right node is added to the factor graph. On the 
other hand, for each i € V a random integer Zj(i) is drawn from a Poisson distribution with 
mean njWi(t^). This means that the bit Xi will be further retransmitted Zj times through the 
effective channel. Furthermore the number of free sockets is updated at the end of each round 
by setting d<(t + 1) = d<(t) - kit). 

This completes the definition of the Tanner graph ensemble. We denote by C(t*, s ) the set 
of function nodes and by k = k(t) t ne total number of times the bit Xi is transmitted 
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through the effective channel. The overall conditional distribution of the channel input given 
the output has the same form (|6.6|) as for the simple Poisson ensemble. 

The overall number of function nodes in the (£*, s) ensemble is easily seen to be a Poisson 
variable with mean n(7i* + s)/P'(l). In fact we add Poisson(n7/P'(l)) function nodes during 
each of the first t* rounds, and Poisson(ns/P'(l)) during the (i* + l)-th round. Analogously, 
the overall number of repetitions Y2i h 1S a Poisson variable with mean n[ n f(t max —t^—l)+ , y—s]. 
Using these remarks, we get the expression 

hn{U,s) = -E c H n (X\Y,Y,Z)= (7.1) 
n 



i E c E yjZ log 2 Z(y,y,z) - ^7^(7** + *) ^ Qc(v\0) log 2 Q c (y\0) - (7.2) 

I I y 

~ QV(V\0) lo§2 Qv{y\0) - (7(*max-t*) Qett( Z \°) lo §2 Qeff(^O) . 

y * 

Let us now look at a few particular cases of the interpolating system. For (t* = t max — 1, s = 
7) we recover the original multi-Poisson ensemble. Moreover, for any t* £ {0, . . . , £ max — 2} the 
ensembles (t*, s = 7) and s = 0) are identically distributed. Finally, when (i* = 0, s = 0) 

the set of factor nodes is empty with probability one, and the resulting coding scheme is just 
an irregular repetition code (bit X{ being repeated l{ times) used over the channel Q e s( ■ | • )■ 
If we denote by /i n (t*, s) the expected entropy per bit in the interpolating ensemble, we get 

M*max-1,7) = h n (7.3) 



h n (0,0) = E^E/log 



Qv(y\x) 



a=l 



It max ^Qefr(^lO) log 2 Q cfr (z|0). 

(7.4) 



Here the expectations on y, {z a } are taken with respect to the distributions Qv(y\0), Qeg(z\0), 
while I is distributed according to the expected degree profile A| n ' 7 ^ . Moreover, we used the fact 
that Yli A^™' 7 ^ I = n7i max . Finally, as in the previous Section (cf. Eq. (|6.12|l ) the conditional 
entropy /i n (0,0) depends upon the effective channel transition probability only through the 
distribution of the log-likelihood U. 
The next step consist in writing 



h n {U,l) = h n (t*,0) + / -3— (U,s) ds , (7.5) 
Jo ds 

for t* £ {0, . . . , t max — 1}- Using the fact that h n (t*, 7) = h n (t* + 1, 0), this implies 

h n = h n (o,o)+ / -^r(t*,s) ds. (7.6) 
The derivative with respect to the interpolation parameter is similar the simple Poisson case: 



fe ii ...it \ * ' / 



- ^ M z EW Wi (Q log 2 / ^ffi \ + ^(n;i*, S ), (7.7) 

^ WeffTO /(i., s ) 
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where ( • }(t t ,s) denotes expectation with respect to the conditional distribution (|6.6|) appropri- 
ate for the (i*, s) ensemble and the expectations E^ 1 ""**^ are defined in App.[Ej Moreover, the 
term (p(n;t*,s) has the following properties. 

A. |y?(n; t*, s)| < C\ for some constant C\ which depends uniquely upon the ensemble pa- 
rameters A, P and 7. 

B. (p(n;t*,s) < C2(i*, s)y/ (log n) 3 /n for some function C2(t*, s) which does not depend upon 
n. 

We refer to App.[E]for the details of this computation. Notice that the equivalent expression for 
Poisson codes, cf. Eq. (|dl4j) . is recovered by setting Wi(t*) = 1/n and dropping the correction 

¥>(•)• 

Finally, we introduce an 'approximation' ^^{t*,s) to Eq. (|7.7j) analogous to Eq. (|fi.l5|) . 
More precisely, we replace the expectations ( • )(t t ,s) with expectations over product measures 
of the form Y\ i P Vi (xi), the ensemble averages E^ 1 '"**^ with averages over i.i.d. V^s, and we 
drop the remainder <£>(•)• Using Eq. (|7.6|) and after rearranging various terms (the relation 
max is useful here) we end up with 



h n = <j> v (A^ n) ,P)+ V / 

imax 1 



dhn .( t - s ) - < ^hi( t - s ) 



ds 



dt ' dt 

t-max J- a'v 

= <p v (^ n) ,P)+ V / Rn(U,8) ds +o n (l), (7.8) 
t t =o Jo 

where the second inequality follows by applying the dominated convergence theorem to tp(-). 
The proof is finished by showing that R n (t*, s) is non-negative for any i* G {0, . . . , i max — 1} 
and s 6 [0,7]. this calculation is very similar to the simple Poisson case, and is discussed in 
App.|B □ 



8 Proof for standard ensembles 

We proved points 1 and 2 of Theorem [2 directly. We will now show that 2 implies the lower 
bound for standard ensembles (point 3) which is the practically more relevant case. 

The idea is that the standard ensemble (n, A, P) is indeed 'very close' to the multi-Poisson 
ensemble (n, 7, A, P) for small 7. In order to implement this idea, we state a preliminary result 
here. 

Lemma 5 Let C\ and C2 be two codes with the same blocklength n from any of the ensembles 
defined in Section^ (the ensemble does not need to be the same). Assume they are used to 
communicate through the same noisy channel and let Hi{X\Y\) , H2{X\Y2) be the corresponding 
conditional entropies. If C\ can be obtained from C2 with n T rewirings, then \Hi(X\Yi) — 
H 2 {X\Y 2 )\<n T . 

Proof: Recall that a rewiring was defined as either the removal or the addition of a function 
node to the Tanner graph. We already proved (cf. Eqs. (|5.5|) to (|5.8|) and relative discussion) 
that the introduction of a single function node induces a change in the conditional entropy 
which is smaller than 1 bit. □ 
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Let now < 7 < 1, and consider a pair of Tanner graphs (Q s , G m p) distributed according to 

the coupling in Lemma|3 In particular, the marginal distribution of Q s = (n, A, P) and Q m p = 
(n, A, P, 7). With an abuse of notation, denote by h n (Q s ) and h n (Q m p) the corresponding 
conditional entropy per bit and by h n , their ensemble averages. From Lemmas ^ and |5] it 
follows that 

hm F[\h n (g mP ) - h n (g s )\ > V] = 0, (8.1) 

n— »oo 

where the constants A and b are as in Lemma ^ Therefore 

\h^ - h n \ = |i[h n (g mP ) - h n (g s )}\ < E\h n (g mP ) - h n (g s )\ < V + o n (i) , (8.2) 

where the last inequality follows from Eq. (|8.1|) together with the remark that h n (g) < 1 for 
any code. By taking the large blocklength limit, we get 

lim inf h n > lim inf fffl - A~f b > (j) V (A^\P) - Aj b , (8.3) 

where the last inequality follows from our lower bound on multi-Poisson ensembles, Eq. (|6.4|) . 
Next we notice that <j)y (A, P) is a continuous function of A (with respect to the total variation 
distance, see App. for a definition), once the degree distribution P, and the ^-variables 
distribution have been fixed. Moreover by Corollary ^ lim 7 ^o A^ = A in total variation 
distance sense. Therefore we can obtain the thesis Ij6.5|) by taking the 7 — > limit in the last 
expression. □ 



9 Examples and applications 

The optimization problem in Eq. (|6.3|) is, in general, rather difficult. Nevertheless, one can 
easily obtain sub-optimal bounds on the entropy h n , by cleverly chosing the distribution of 
the ^-variables to be used in Eq. ()6.2|) . Moreover, bounds can be optimized through density 
evolution. Although a complete discussion of the optimization problem is beyond the scope of 
this paper, a rather simple approach, cf. Sec. 19.51 and Tab.^J already gives very good results 
(indeed we believe them to be optimal). 

Our main focus will be here on standard (n, A, P) ensembles. As in our original definition, 
cf . Eq. (|2.1|) , we shall generically consider the case in which Ao = Ai = (no degree or degree 
1 variable nodes). However, most of the arguments can be adapted to Poisson ensembles too. 
On the other hand, we shall always assume Pq = P\ = (no degree or degree 1 check nodes). 

Throughout this Section, we shall use the notation h = lim inf n ^oo h n for the asymptotic 
conditional entropy per symbol. 



9.1 Shannon threshold 



Assume V = with probability 1, and therefore, from Eq. (|6.1j) . U 
Plugging these distributions into Eq. Ijfi.2|) we get 



with probability 1. 



P'(V. 



+ E v log 2 



E 



Qv( 



Qv(y\0) 



E 



Qc{y\x) 
Qc(y\0) 



(9.1) 
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Using Theorem |2J and after a few manipulations, we get 

h > for the LDGM(n, A, P) ensemble, (9.2) 

'"des 

h > r dcs -C{Q), for the LDPC(n, A, P) ensemble. (9.3) 
Where we denoted C(Q) the capacity of the BIOS channel with transition probability Q(y\x): 

C(Q) = l-^Q(y|0)log 2 

In other words reliable communication (which requires h n — > as n — > oo) can be achieved only 
if the design rate is smaller than channel capacity. For the LDGM ensemble this statement is 
equivalent to the converse of channel coding theorem, because r des is concentrated around the 
actual rate. This is the case also for standard LDPC ensembles with no degree or degree 1 
variable nodes [36]. 

For general LDPC ensembles, Eq. ()9.3|) is slightly weaker than the channel coding theorem 
because the actual rate can be larger than the design rate. However, as shown in the next 
Sections, the bound can be easily improved changing the distribution of V. 

Of course this results could have been derived from information-theoretic arguments. How- 
ever it is nice to see that it is indeed contained in Theorem |2 



E 



Q(y\x) 
Q(y\o) 



(9.4) 



9.2 Non-negativity of the entropy 

Let us consider the opposite limit: V = +oo with probability 1, and distinguish two cases: 

LDPC(n, A, P) ensemble. As a consequence of Eq. (|6.1|h also U = +oo with probability 1. 
Using Theorem |21 and Eq. (|6.2j) . it is easy to obtain 

h>A H Q (X\Y), (9.5) 

where Hq(X\Y) = 1 — C(Q) is the relative entropy for a single bit transmitted across the 
channel Q(y\x). The interpretation of Eq. (|9.5jl is straightforward. Typically, a fraction Ao 
of the variable nodes have degree zero. The relative entropy (|5.1|) is lower-bounded by the 
entropy of these variables. 

LDGM(n, A, P) ensemble. Equation (|6.1|) implies that U is distributed as the log-likelihoods 
Ja = (1/2) log Q(y\0)/Q(y\l), see Eq. (|3.9|) . It is easy then to evaluate the bound: 



h > EiE y log 2 



i 



En 



Q(yi\x) 



The meaning of this inequality is, once again, quite clear: 

n n 

H n (X\Y) >Y, H n(X l \X^,Y) >Y,H n {Xi\{ Xj = Q Vj + i};Y) 



(9.6) 



(9.7) 



i=i 



where we introduced X® = {Xj}j^i. The above inequalities are consequences of the entropy 
chain rule and of the fact that conditioning reduces entropy. Taking the expectation with 
respect to the code ensemble and letting n — > oo yields (|9.6|) , 
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Figure 4: Left frame: trial entropy <pv on the erasure channel BEC(e) as a function of the parameter 
z characterizing the ^-distribution, cf. Eq. (|9.1U|) . Here we consider the LDPC(n, A, P) ensemble with 



A(x) 



and P(x) 



(i.e. a regular (3,6) Gallager code). A square (□) marks the high bit-error-rate 



local maximum Zbad( e )- Right frame: graphical representation of the equation z 
extrema of the trial entropy, cf. Eq. (|9.11|) . 



f(z) yielding the local 



9.3 Binary Erasure Channel 

Let us define the binary erasure channel BEC(e). We have A = {0, 1, *} and Q(0|0) = Q(l|l) = 
1 — e, Q(*|0) = Q(*|l) = e. Since the log-likelihoods (|3,9j) take values in {0, +00} it is natural 
to assume the same property to hold for the variables U and V. We denote by z (z) the 
probability for V (U) to be 0. As in the previous example we distinguish two cases 
LDPC(n,A, P) ensemble. Equation 1)6.1 Jl yields 

z = l-p(l-z). (9.8) 

It is easy to show that Eq. (|6.5|) implies the bound 

h > sup cf)(z, 1- p(l- z)) , (9.9) 

ze[o,i] 

where 

z) = A'(l)z(l - z) - ^[1 - P(l - *)] + e A(z) . (9.10) 

Notice that <f)(z, 1 — p(l — z)) = ip(z) is a smooth function of z € [0, 1]. Therefore the sup in 
Eq. (|9.9jl is achieved either in z = 0, 1 or for a z £ (0, 1) such that il>(z) stationary. It is easy 
to realize that the stationarity condition is equivalent to the equations 

z = e\(z), z = l-p(l-z). (9.11) 

The reader will recognize the fixed point conditions for BP decoding [29]. 

Let us consider a specific example: the (3, 6) regular Gallager code. We have A(x) = x 3 and 
P(x) = x 6 : P(x) is convex for any x € R and therefore Theorem |2] applies. The design rate is 
r dos = 1/2- In Fig. 0]we show the function ip(z) for several values of the erasure probability. 
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Figure 5: Left frame: trial entropy evaluated at its local maxima .Zgood(e) and Zbad(e)> as a function of the 
erasure probability e (notice that vp(zg 00 d(e)) = identically). Right frame: bit error rate under maximum 
likelihood decoding. The dashed line is the lower bound obtained from Fano inequality. The continuous 
curve is the conjectured exact result. 



In the right frame we present the function f(z) = eA(l — p(l — z)) for some of these values. 
At small e, the conditions (|9.11j) have a unique solution at z gQO( i(e) = 0, and i/j(z) has its 
unique local maximum there. The corresponding lower bound on the conditional entropy is 
ip(z goo d{e)) = 0. For e > eep ~ 0.4294398 a secondary maximum Zbad(e) appears. Density 
evolution converges to £bad(e) and therefore this fixed point control the bit error rate under 
BP decoding. For eep < e < 6map ~ 0.4881508, ^(^bad) < VK^good) an d therefore this local 
maximum is irrelevant for MAP decoding. Above eMAP> VK z bad) > VK-^good) an d therefore 
Zbad(e) controls the properties of MAP decoding too. 

In Fig. left frame, we reproduce the function V^bad^)) as a function of e. Fano inequality, 
cf. Lemma HJ can be used for obtaining lower bounds on block and bit error rates in terms of 
the quantity maK.{tp(z g00( i(e)) , ^(z^ie))} . The result for our running example is presented in 
Fig. 03 right frame. It is evident that the result is not tight because of the sub-optimality of 
Fano inequality. For instance, in the e — * 1 limit, VK^bad^)) yields the lower bound h n > 1/2. 
This result is easily understood: since no bit has been received, all the 2 n l 2 codewords are 
equiprobable. On the other hand Fano inequality yields a poor P^(e = 1) > 0.11003. 

A better (albeit non-rigorous) estimate is provided by the following recipe. Notice that BP 
decoding yields (in the large blocklength limit) 

P? P (e) = l ^l^? f re < £BP ' (9.12) 
b w \ eA(z bad (e)) for e > e B p . V ' 

Our prescription consists in taking 



pMAP^ _ I ciH^gocMHc;; lui c ^ C MAP , ^ 



eA(z good (e)) for e < e M AP , 
eA(z bad (e)) for e > e M AP • 

In other words, BP is asymptotically optimal except in the interval [eep, £map]- Generalizations 
and heuristic justification of this recipe will be provided in the next Sections. The resulting 
curve for our running example is reported in Fig. [SJ right frame. 



24 



7 

k 


7 




£map : New UB 


Gallager UB 


Gallager LB 


2 


4 


1/3 


1/3 


* 


* 


3 


6 


0.4294398 


0.4881508 


0.4999118 


0.4833615 


4 


8 


0.3834465 


0.4977409 


0.4999118 


0.4967571 


5 


10 


0.3415500 


0.4994859 


0.4999997 


0.4992593 


6 


12 


0.3074623 


0.4998757 


0.5000000 


0.4998207 



Table 1 : Maximum a posteriori probability and belief propagation thresholds for several ensembles of the 
form LDPC(re, 7, x k ) with 7 = (1 — r^ es )k and r^es = 1/4- For the MAP threshold we compare several 
different thresholds: 'New UB' is the upper bound derived in this paper; 'Gallager UB' is Gallager lower 
bound as generalized in Ref. [9]; 'Gallager LB' is the upper bound derived using Gallager's technique, as 
applied in Ref. [11]. 



The analysis of this simple example uncovers the existence of three distinct regimes: (i) A 
low noise regime, e < eep: both BP and MAP decoding are effective in this case: the bit error 
rate vanishes in the large blocklength limit; (u) An intermediate noise regime, 6bp < e < cmap- 
Only MAP decoding can produce vanishing error rates here, (iii) An high noise regime, 
£map < £• The bit error rate under MAP decoding is bounded from below. In Table ^ 
we report the values of eep and €map for a few ensembles LDPC(n,A, P) with A(x) = x , 
P(x) = x k and r^ cs = 1/2. As we shall discuss below, this pattern is quite general. 

LDGM(n,A,P) ensemble. It is interesting to look at the differences between LDGM and 
LBPC ensembles within the BEC context. The requirement (|6,1|) implies 

z = 1 - (1 - e)p(l - z) . (9.14) 

Applying Theorem |21 we get the bound 

h> sup <j>(z, 1 - (1 - e)p(l - z)) , (9.15) 

26[0,1] 

where, with a slight abuse of notation, we defined 

4(z, z) = A'(l)*(l - z) - A>(1 ^~ €) [1 - P(l ~ z)\ + A(z) . (9.16) 

As in the LDPC case we look at the stationarity condition of the function ip{z) = <j)(z, 1 — (1 — 
e)p(l — z)). Elementary calculus yields the couple of equations 

z = X(z), z = l-(l-e)p(l-z), (9.17) 

that, once again, coincide with the fixed point conditions for BP decoding. These equations 
have a noise- independent solution Zbad( e ) = 1 (implying z = 1 because of Eq. (j9.14f) L Theorem 
121 yields h > 1 — C(e)/rd eS) with C(e) = 1 — e the channel capacity: we recover in this context 
the simple lower bound of Sec. 19.11 

A better understanding of the peculiarities LDGM ensembles is obtained by looking at a 
particular example. Consider, for instance, A(x) = x s and P(x) = x 4 which corresponds to 
a design rate r^es = 1/2- Theorem |2] applies because P(x) is convex on KL In Fig. El left 
frame, we plot the function ip(z) for several values of the erasure probability e. It is clear 
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Figure 6: Left frame: trial entropy for the LDGM(n,A, P) ensemble on the BEC(e) channel. Here 
A(x) = x 8 and P(x) = x 4 (the design rate is rd cs = 1/2)- A square (□) marks the small- P^ stationary 
point Zgood(e)- Right frame: graphical representation of the stationarity condition z = f(z). 



that Zbad = 1 is always a local maximum. A second local maximum -z goo d( e ) appears when the 
erasure probability becomes smaller than eep ~ 0.6165534. The extremum at z goo d(e) becomes 
a global maximum for e < eMAP> with emap ~ 0.5022591. In Fig. El right frame, we reproduce 
the function f(z) = A(l — (1 — e)p(l — z)) in terms of which the stationarity condition ()9.17j) 
reads z = f(z). We also mark the solutions z goo d(e) (corresponding to a local maximum of 
ip{z)) and Zmst(e) (corresponding to a local minimum of ip(z)). 

The interpretation of these results is straightforward. Maximum likelihood decoding is 
controlled by the stationary point Zbad = 1 for e > £map- In this regime the lower bound 
(|9.16|) yields the same conditional entropy as for the random code ensemble. We expect the 
bit error rate in this regime to be -Pb(e) = 1/2. At low noise (e < €map) the fixed point z goo <i(e) 
controls the MAP performances. Analogously to what argued in the previous Subsection, we 
expect this to imply to a bit error rate Pb(e) = A(l — (1 — e)p(l — z goo d(e))). 

As for BP decoding, it has a unique fixed point z^ad = 1 for e > €bp- This corresponds to 
a high bit error rate Pb(e) = 1/2. A second, locally stable, fixed point appears at esp- If BP is 
initialized using only erased messages (as is usually done), all the messages remain erased (BP 
does not know where to start from). The same remains true is a small number of non-erased 
(correct) messages is introduced: density evolution is still controlled by the Zb a d fixed point. If 
however the initial conditions contains a large enough fraction (namely, larger than 1 — Zi ns t) 
of correct messages, the small-Pb fixed point z goo d is eventually reached. 

Let us finally notice that the present results can be shown to be consistent with the ones 
ofRefs. [37,38]. 

9.4 General channel: a simple minded bound 

The previous Section suggests a simple bound for the LDPC(n,A, P) ensemble on general 
BIOS channel. Take, as for the BEC case, V = with probability z and = oo with probability 
1 — z, while {7 = with probability z and = oo with probability 1 — z. These conditions are 
consistent with the admissibility requirement (|6,1|) if 

z = 1 - p(l - z ) . (9.18) 
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Plugging into Eq. ()6.2|) we get a bound of the same form as for the BEC, cf. Eq. (|9.9|) with 
z) = A'(l)z(l - i) - ^ [1 - P(l - *)] + [1 - C(Q)] A(i) . (9.19) 

Passing from the BEC to a general BIOS, amounts, under this simple ansatz, to substituting 
1 - C(Q) to e. 



9.5 General channel: optimizing the bound 

We saw in Sec. l9.3l that. for the BEC, stationary points of the trial entropy function correspond 
to fixed points of the density-evolution equations. This fact is indeed quite general and holds 
for a general BIOS channel. 

In order to discuss this point, it is useful to have a concrete representation for the random 
variables U, V entering in the definition of the trial entropy (jfi.2j) . A first possibility is to 
identify them with the distributions U(x) = P[C7 < x] and V(x) = P[V < x] as explained in [30]. 
The distributions are right continuous, non decreasing functions such that lim a; _ + _ 00 A(x) = 0, 
and lim^^+oo A(x) < 1. Viceversa, to any such function we may associate a well defined 
random variable. It is convenient to introduce the densities u(x) and v(x) which are formal 
Radon-Nikodyn derivatives of U(x) and V(x). We also introduce the log-likelihood distributions 
associated to channel output, cf Eq. (|3T9|) : J(x) = P[J < x] and H(x) = F[h < x}. The 
corresponding densities will be denoted by j(x) and h(x). 

The admissibility condition 1)6.1 j) translate of course into a condition on the distributions 
U(x) and \/(x). Following once again [30], we express this condition through 'G-distributions'. 
More precisely for any number x 6 ( — oo,+oo], we define 7(3;) = (71 (x), 72(x)) € {±1} x 
[0, +00) by taking 71 (x) = signx, and 72(2;) = — log|tanhx|. We define T to be the change- 
of-measure operator associated to the mapping 7. If X is a random variable with distribution 
A and (formal) density a, T(a) is defined to be the density associated to 7(A). Despite the 
notation V is defined on distributions, and only formally on densities. The action of V is 
described in detail in [30]. Among the other properties, it has a well defined inverse r . We 
can now write the condition implied by ()6.1|) in a compact form: 

^max 

u = £ Pk r- 1 [r(j) ® r(v)^- 1 )] = Pj ( v ) . (9.20) 

k=2 

A second concrete representation is obtained by inverting the distributions U(x), V(x). Of 
course this is not possible unless U(x), V(x) are continuous and increasing. However, we can 
always define the 'inverse distributions': 

U:(0, 1) -> (-00, +00] (9.21) 
f ^ U(^) = min{x such that U(x) > ^} , (9.22) 

with an analogous definition for V(£). We introduce analogously the inverse distributions 
J(£) and H(^) for the log- likelihoods J a and hi, cf. Eq. (|3.9|) . Notice that, given a real 
valued random variable X, its inverse distribution A(£) is non-decreasing and left-continuous. 
We shall denote by A the space of inverse distributions. Moreover, it has a simple practical 
meaning: if one is able to sample £ uniformly in (0, 1), then A(£) is distributed like X. From 
this observation it follows that any inverse distribution A(£) uniquely determines its associated 
random variable. 
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We can now re-express the trial entropy (|6.2[) as a functional over .4, <p = c/>(U,V) using 
the above correspondence 5 . After some straightforward computations we get 



A'(l) /log 2 [l + tanhU(6)tanhV(£ 2 )] dfo^i + 



(1 + fTtanhH(^o))n( 1 + (TtanhU (^ 



(9.23) 



+ ^2A ie -^ /log 
I J 



i=0 



1 + tanhJ(^ )n tanhV (^ 



i=l 



- c(Q v ) - j^-MQc) , 



i=0 



(1) 



with all the integrals on the £j's being on the interval (0,1). This representation allows to 
easily derive the following result. 

Lemma 6 Assume the supremum of the trial free energy (f)y over the space of admissible 
random variables is achieved for some couple (U, V) . Then 



(9.24) 



i=l 



I being a Poisson random variable of parameter 7 and h distributed according to the definition 

Proof: Look at ^ as a functional (U,V) — > 0(U, V) of the inverse distributions U and 
V. The idea is to differentiate this functional at its (assumed) maximum. Let D : (0, 1) — > 
(—00, +00] be left continuous and non decreasing. It is an easy calculus exercise to show that 



|«U + *,V) 
|*(U,V + eD) 



e=o 



£=0 



-A'(l) y[l-tanh 2 U(£)]D(£)^(U,V) 
-A'(l) J [l-tanh 2 V(0]D(e)^(U,V), 



where 



^(U,V) 



tanhV(^i) 



1 + tanh U(£) tanh V(£i) 



(9.25) 
(9.26) 

(9.27) 



tanh[H(e ) + ELiUte)] 



%(U,V) 



+ tarmU(£)tanh[H(£ o )+EUU(£0] , „ 
tanhU(^i) 



TtTTT^I 



1 + tanh U(£i) tanh V(£) 

tanhJ(£o)ntitanhVfe 



(9.28) 



Pk 



fc-i 



1 + tanh J (ft) nti tanh V(&) tanh V(£) f* 



11^ 



Notice that £/^(U,V) vanishes because of the admissibility condition (|ri.l|) . It is then straight- 
forward to show that .Fg(U, V) must vanish for any £ such that U(£) < 00, in order for (U, V) 
to be a maximum. This in turns implies the thesis. □ 



5 Since throughout this Section the degree sequences are kept fixed, we shall drop the dependence of <f> on (A 
and (with an abuse of notation) replace it with its dependence upon U and V. 
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k I rdcs New UB Gallager UB Gallager LB Shannon limit 



4 
5 
6 
6 



3 
3 
3 
4 



1/4 
2/5 
1/2 
1/3 



0.2101(1) 
0.1384(1) 
0.1010(2) 
0.1726(1) 



0.2109164 
0.1397479 
0.1024544 
0.1726268 



0.2050273 
0.1298318 
0.0914755 
0.1709876 



0.2145018 
0.1461024 
0.1100279 
0.1739524 



Table 2: Thresholds for regular LDPC codes over the binary symmetric channel BSC(p) (k and I are, 
respectively, the check and variable node degrees, and r^es the design rate). The new upper bound proved 
in this paper is evaluated numerically following the approach described in the text. The quoted error 
comes from Monte Carlo sampling of the random variables U and V. 'Gallager UB' and 'Gallager LB' 
refer respectively to the upper and lower bounds obtained by Gallager [1]. 



9.6 Numerical estimates and comparison with previous bounds 

The discussion in the previous Section suggests a natural possibility for evaluating numerically 
the lower bound in Theorem[2 Run density evolution [3] for T iterations and then evaluate the 
trial entropy (|6.2j) taking U and V to be random variables with the density of (respectively) 
right-to-left or left-to-right messages. Notice that, in order for Eq. (|6.1|) to be satisfied, the 
right-to-left density must be updated one last time before evaluating the trial entropy. 

This still leaves a lot of freedom. The first question is: how large T (the number of 
iterations) should be? While it is difficult to provide a quantitative answer, in order to approach 
the supremum in Eqs. (j6.3j) to ([fi.Bjl . one should get a good approximation of fixed point 
densities. Generically, this happens only as T is let to infinity. 

The next question is: how the densities should be initialized? This question has a very 
simple answer in usual applications of density evolution: just initialize to the message density 
seen at the zeroth step of message passing. This generally means U, V identically equal to 0. 
Hereafter, we shall refer to this as the '0-initialization' This answer is no longer complete in 
the present context. In fact any initial condition, such that U and V are symmetric random 
variables, corresponds eventually to a valid lower bound of the form in Eqs. Qfi.3|) to (|6.5|) . At 
least one other simple initial condition consist in taking U = V = +oo identically. In the case 
of standard ensembles with minimum left degree at least 2 this is in fact a fixed point and the 
corresponding trial entropy vanishes. We shall refer to this as the 'co-initialization'. 

Despite this freedom, Eqs. (|fi.3|) to (|B.5|) always provide a lower bound, no matter how we 
implement the general strategy. In Tab. El we report the numerically-evaluated upper bound 
for a few regular ensembles over the binary symmetric channel. We implemented a sampled 
(Monte Carlo) version of density evolution (with 10 4 to 10 5 sampling points) and adopted the 
0-initialization. The trial entropy (j6.2j) was averaged over 10 iterations after 10 4 equilibration 
iterations. The threshold was estimated as the smallest noise parameter such that the trial 
entropy is positive. 

In the same Table (and analogously in Tabled for the erasure channel), we compare our 
upper bounds with previous upper and lower bounds. In his thesis [1], Gallager used an 
estimate of the distance spectrum, together with a clever modification of the union bound in 
order to obtain lower bounds. This technique was further generalized and refined over the 
years, see for instance [8,10,11] but it's fair to say that there were no major modification over 
the initial result for the simplest regular ensembles and channel models. We should stress that 
a different technique, based on typical pairs decoding was proposed in [7]. Our evaluation of 
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Gallager bound wasn't numerically distinguishable from the results of [7]. 

As for upper bounds, Gallager approach is based on an information theoretic argument. 
Also in this case, despite some improvements [9], the main idea remained essentially unchanged. 
Moreover, the quantitative estimates by Gallager remained essentially the state-of-the art for 
the simplest regular ensembles and channel models considered here. 

Despite the various estimates in Tables ^andE] are numerically close (which is partially due 
to the proximity of capacity) , the bound of Theorem E] is clearly superior to previous upper 
bounds. 



9.7 Relation with the Bethe free energy 

Until now we studied the average properties of the code ensembles defined in Sec. El Although 
the concentration result of Sec. El justify this approach, it may be interesting to take a step 
backward and consider the decoding problem for a single realization of the code and of the 
channel. It is convenient to introduce the 'Bethe free-energy' [39] F B (b) associated to the 
probability distribution (|3.3j) . We have 



F B (b) 



U B (b)-H B (b) 



(9.29) 



where 



^2 ^2 h a{x a ) log 2 Qc{ija\x a ) - ^ b i( X i) lo §2 Qv{Vi\xi) , (9.30) 
aeC x a tgV Xi 

Y, E 6 <^a) l0 §2 ba(Xa) + ^(|^| - 1) £ bi(xi) log 2 h( Xi ) , (9.31) 

aeC x a iev Xi 



U B (b) = 

H B (b) = 

aeC x a 

and we used the shorthands x a = (x^, . . . ,Xi*) and Qc(ila\x a ) = Qc(y ). The 

parameters {b a (x a ) : a G C} and {bi(xi) : i G V} are probability distributions subject to the 
marginalization conditions 



Xj,j£da\i 

^Hxi) 



bi(xi) 
1 



Vi G da , 
Vi. 



(9.32) 
(9.33) 



For LDGM codes F B (b) is always finite. For LDPC codes, it takes values in (—00, +00] and 
is finite if b a (x a ) vanishes whenever x^ © • • • © xw = (as always we use the convention 
OlogO = 0). Moreover, as explained in [39], its stationary points are fixed points of BP 
decoding. 

Following [39], we consider the stationary points of the Bethe free energy (|9,29|) under 
the constraints (|9.32|) . ()9.33|) . This can be done by introducing a set of Lagrange multipliers 
{Aja(^i)} f° r Eq. (|9.32j) . the constraint (|9,33|) being easily satisfied by normalizing the beliefs. 
One then consider the Lagrange function 



L B (b,X)=F B (b)- ]T ^A ia (x,) 

(ia)e£ Xi 



2 ba(x a ) - h 

Xj : j£da\i 



(9.34) 
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We refer to [39] for further details of this computation. Stationary points are eventually shown 
to have the form 



b a{Xa) = — Qc(ya\x a ) ]~[ P Vj ^ a (xj), 



jeda 



bi(xi 



— Qv(yi\xi) J| p u a ^A 



(9.35) 
(9.36) 



with P.(x) being defined as in Eq. Q4.9JI . The z a 's and Zj's are fixed by the normalization 
conditions ^2 x b a {x) = 1 and ^2 x h(xi). The messages {vi^ a } are related to the Lagrange 
multipliers {Aj a (xj)} by the relation 



Pvi^ a (xi) oc exp{X ia (xi)} , 
while the {u a ^i} must satisfy the equation 

bddi\a 

for any % € V. The marginalization constraint (|9.32f) is satisfied if the equation 

u a ^i = arctanhjtanh J a J^J t&nhvj^ a } 

j£da\i 



(9.37) 
(9.38) 

(9.39) 



holds for any a £ C. 

If we substitute the beliefs Ij9.35j) . (|9,36|) into Eq. (|9.29|) we can express the Bethe free 
energy as a function of the messages u = {M a _,j} and v = {t>j_> a }. Using Eqs. I|9.38|) . (|9.39|) . 
we get the following expression (with a slight abuse of notation we do not change the symbol 
denoting the free energy) 



F B (u,v) 



Yl lo §2 

{ia)eS 



x ieda 



. Xi a£di 



X j 



(9.40) 



A simple comparison of this expression with Eq. (|6.2[) yields the following interesting result. 

Proposition 1 LetF B (u,v) be the Bethe free energy for any of the code ensembles LDGM(n, 7 , P) 
LDPC(n, 7, P), LDGM(n, A, P), LDPC(n, A,P), with the beliefs parameterized as in Eqs. 

\9. 3&f) and \9. 3b)) . and assume that the messages are i.i.d. random variables u^ a = U and 

v a ^i = V. Then 



lim -EF B (u,v) = -0y(A,P)+KVQ(y|O)log 2 Q(y\0) , 



n— »oo fi 



(9.41) 



where the expectation E(-) is taken both with respect to the messages distribution and the code 
ensemble, and k = j/P'(l) (for LDGM ensembles) or k, = 1 (for LDPC ensembles). For multi- 
Poisson ensembles LDGM(n, 7, A, P) and LDPC(n, 7, A, P), the same formula holds with A 
being replaced by A^ on the right hand side. 
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Proof: In order to compute the expectation on the left hand side of Eq. (|9.41j) . let us 
proceed in two steps. In a first step, we shall take the expectation with respect to the messages 
{u^ a , Va—ti}, which in the Theorem statement are assumed to be i.i.d.'s, as well as with respect 
to the channel output {m, y a }. Let us denote by Vi (Ck) the set of variable nodes (check nodes) 
of degree I (degree k). By linearity of expectation, we get 



\£\ E„ „log 2 



J2\Vl\ Ej,E u log 2 



a=l 



y~] \Ck\ E v E u \og 2 



i=i 



(9.42) 



Now notice that the number of edges is equal to the number of variable nodes times the 
average left degree: \£\ = nA'(l). The number of variable nodes of degree I is, by definition 
|Vj| = nA[. Furthermore the total number of check nodes is nA'(l)/P'(l), and therefore 
l^fcl = (?T'A / (1) / -P / (l))jPfc. Finally both for Poisson and standard ensembles, the expected 
degree profile converges to the design profile, see Sec. 12.21 In other words 



limEgA^Ai, lim Eg P k = P k , lim Eg A'(l) = A'(l) . 



(9.43) 



Therefore (|9.41f) is proved by taking the expectation with respect to the graph ensemble in 
Eq. (|9.42|) and then taking the large blocklength limit 6 . 

Finally, for the multi-Poisson ensemble we gave just to notice that the expected left degree 
profile converges to A^ 7 ' rather than to A, see App. [B] n 

This result provides an appealing interpretation for the trial entropy entering in Theorem 
El Apart from a simple rescaling, it is asymptotically equal to the expected value of the Bethe 
free energy when the messages {itj_> a } and {v a ^i} are i.i.d. random variables. Viceversa, 
Theorem [21 can be interpreted as yielding a connection between the Bethe free energy, and the 
conditional entropy of the transmitted message. 



10 Generalizations and conclusion 

We expect that the results derived in this paper can be extended in several directions. 

A first direction consists in proving the analogue of Theorem [21 for more general code 
ensembles. It is important to notice that the technique used in this paper (as well as in [26]) 
makes a crucial use of the convexity of P(x). Although the non-rigorous calculations of [17-20] 
suggest that the the result will have the same form for a non-convex P(x), the proof is probably 
more difficult in this case. 

A second direction consists in proving that the bound of Theorem [21 is indeed tight. We 
precise this claim as follows. 

Conjecture 1 Under the hypotheses of Theorem\^we have 

lim h n = sup 4>y ) (10.1) 

n— >oo y 

6 Notice that in the case of Poisson ensembles A/ as no bounded support (I can be arbitrarily large). However the 
thesis follows from convergence of Ep A; in total variation distance. 
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where the sup has to be taken over the space of admissible random variables. The degree 
sequences to be used as argument of (fry are the same as in Theorem^ 

Once again, this claim is supported by [17-20]. 

Finally, in this paper we limited ourselves to estimating the conditional entropy per channel 
use. As discussed in Section [SJ this implies only sub-optimal bounds on the bit error rate. It 
would be therefore important to estimate directly this quantity without passing through Fano 
inequality. The results of [17-20] suggest the following recipe for computing the bit error rate 
under symbol MAP decoding. Determine the message densities maximizing the trial entropy, 
cf. Eq. (|6.3|) . Compute the density of a posteriori likelihoods as in density evolution (this 
implies a convolution of all the densities incoming in a variable node). The bit error rate is 
simply given by the weight of negative log-likelihoods under this distribution. 

Finally, one may hope that the strong connection between message passing techniques (den- 
sity evolution) and MAP decoding (conditional entropy) highlighted in the present approach 
may lead to a better understanding of the former. 
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A Coupling graph ensembles 

In this Appendix we prove Lemma ^ Instead of exhibiting directly a coupling between a 
standard graph Q s = (n,A,P) and a multi-Poisson graph GmP = (n,A, P, 7), we shall proceed 
in two step. More precisely, we shall exhibit two couplings (Q S ,Q*) and (G*,Gmp) where the 
distribution of G* = (n, A, P,7)# is defined below (as in Sec. 12.31 we let i max = [A' ( 1 ) /7 J — 1 
be the number of rounds). 

In order to generate a random element in (n, A, P, 7)*, proceed as for the multi-Poisson 
ensemble (see Definition^ but the following modification. During stage t, for each check node 
a = (a,k,t) £ Ct, and for each r = 1, ...,k, i% is chosen randomly in V with distribution 
Wi(t,a,r) = (di(t) — Ai(t, a,r))/[Y2j(di(t) — Aj(i, a, r))], where Aj(t, a,r) is the number of 
times i has been already chosen during stage t. In other words, unlike in the multi-Poisson 
ensemble, we keep track faithfully of the number of free sockets. 

Let us now describe how the two-step coupling works. 

From GmP to G*' Consider round t. Let d™ p (i) and d*(t) be the number of free sock- 
ets respectively for GmP and G*- Choose the variable nodes (i") m p and (i")* in the 
two graphs by coupling optimally (see discussion below) the distributions wf^(t) = 
K P (t)] + /(E j \df(t)] + ) and w*(t;a,r) = (d?(t) - A,(t; a, r))/[£,,(d?(*) " A^a.r))]. 
If (1%)* 7^ (i?)mPi we say that a 'discrepancy' has occurred. We claim that, if 7 < 1, 
then the total number of discrepancies is smaller than Awy b w.h.p. (C and b being n— 
and 7-independent constants). The proof of this claim is provided below. Of course the 
number of rewirings necessary to pass from GmP to G* is bounded by twice the number 
of discrepancies. 
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From Q* to Q s : Notice that is generated in the same way as Q s (see discussion at 
the end of Sec. 12. lj) but for the fact that it contains a random number of check nodes. 
In fact, the total number of check nodes of degree k (call it mfc) in a (n, A, P, 7)* graph 
is a Poisson random variable with mean m^ = i max n7Pfc/P'(l). Denoting by ml = 
nA'(l)Pfc/P'(l) the number of check nodes in a standard (n, A, P) graph, it is easy to see 
that mi —2Brvy < m^ < mj^ —Bwy for some positive (n and 7 independent) constant B. 
By elementary properties of Poisson random variables, one obtains mY — ZBn^ < rrik < 
m fc^ ~~ (l/2)Pn7 for each fc € {2, . . . , /c max } with probability greater than 1 — 2e~ c ^" a for 
some constant C. 

We therefore obtain the desired coupling as follows: first generate Q*. If m& > mi 
for any fc, then generate an independent graph Q B . In the opposite case, generate Q B by 
adding ml — check nodes for each k and connect them to variable nodes as described 
at the end of Sec. 12.11 Because of the above argument, the number of rewirings (check 
nodes added) is smaller than A'nj with high probability. 

We are now left with the task of proving the claim in the first step. Before accomplishing 
this task, it is worth recalling an elementary fact which is useful in this proof [40]. Given two 
distributions {iw^ } and {u^} over i g [n], their total variation distance is defined as 

\\ w m- w w\\ = lj2\wP-wl 2) \. (a.i) 

i=l 

Furthermore, if i\ and 12 are distributed according to and w^ 2 \ there exist coupling 
between them (i.e. a joint distribution which has w. , as marginals) such that \\w^ — 
u/ 2 )|| = ~P(ii 7^ 12)- Such coupling is 'optimal' in the sense that, for any coupling we have 

The proof of the claim is obtained by recursively estimating the number of discrepancies 
between Q m p and Q*. Suppose that we have terminated the first t rounds (denoted as 0, . . . , t— 1 
in Definition HJ) in the generation of the couple (GmP,G*) an d no more than Ctwy discrepancies 
occurred so far (with C% n-independent). This hypothesis trivially holds for t = 0. We will 
determine an n-independent constant Ct+i, such that, at the end of the t-th step there will 
be, w.h.p., less than Ct+iwy discrepancies. By iterating this argument, we deduce that Q m p 
and have less than Ct max 727 discrepancies with high probability, and will be able to obtain 
the estimate Ct max < A + B^~ p with < p < 1, A, B > three 7-independent constants. This 
implies Lemma ^ with b = 1 — p. 

During the round t, (z")* and (i") m p are taken from the distributions wf^ty) and w*(t; a, r) 
described above. Let us start by noticing that, with high probability 

n n n 

J2\l<i? P (t)} + -d*(t) + Mt;a,r)\ < ^K mP W-d*(t)| + ^A 4 (t;a,r)< 
i=i i=i i=i 

< C t n-/ 2 t + 2n7 . (A. 2) 

The first inequality follows from the fact that d*(t) and Aj(i;a, r) are non-negative. The 
second one, from the induction hypothesis together with the observation that Y27=l Ai(i;a, r) 
is smaller than the total number of variable node choices during round t, which in turn is a 
Poisson random variable with mean wy. 
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Next, we observe that, w.h.p. 



^[dr P (t)] + >n(A'(l)- 7 t- 7 ). (A.3) 



i=\ 



In fact, at t = the sum on the left-hand side has value nA'(l), and after each round it 
decreases at most by the number of left sockets which are occupied during that round. This 
is a Poisson random variable of mean nj. 

Now we can estimate the variation distance between the distributions of (i") m p an d 



< C ^ t + 2 ^ < (A.4) 
" A'(l)- 7 (t + l) " 1 ' 

< (A,) 

where the second inequality follows from t < t max and i max < A'(l)/7 — 1. During round 
i, about 717 couples (i") m p and are chosen and they differ with probability ||u; mP (t) — 
w* (t; a, r) | | (because we coupled w mP (t) and w*(t;a,r) optimally). The total number of dis- 
crepancies is therefore smaller than 2n7||?i; mP (i) — w*(t; a,r)\\ with high probability. Unhappily 
this estimate worsen as t approaches i max because of the denominator in (|A.5|) . This problem is 
overcome as follows. Fix i* = i max — L^maxJ , where < p < 1 is the solution of p = 2A'(1)(1— p), 
and use the estimate (|A.5|I only for t < t*. For < t < i max we just use the fact that during 
each round no more than 717 discrepancies can be introduced. In other words 

r / C t + 2A'(1) (C t + A)/(W - t) if t < U, 1 a ^\ 

Gm "\Q + l ifi*<t<i max , (A - 6) 

where we introduced the constant A = 2/A'(l) > 0. This recursion is easily summed up, 
yielding 



io g (a + a) - ma) = E lo 4 1 + r^)^£r^7^ ( A - 7 ) 

£_q \ %ax & / '■max « 

< 2A'(1) £ i<2A'(l)log(S^). (A.8) 

Using the definition of t max and the relation 2A'(1)(1 — p) = p, we get Ct, < ^4 + B^~ p with 
^4 and I? two 7 independent constants. Finally Ct max < Ct, + [imax J < A' + B'^/~ p . 



B Degree distribution for mult i- Poisson ensembles 

In this Appendix we provide an asymptotic characterization of the variable-node degree profile 
for the multi- Poisson ensemble (n, A, P, 7). We shall start by defining a construction which 
yields the degree profile in the large blocklength limit and then prove convergence to this 
construction. 
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Let fij j be a sequence of distributions over Z € {2, . . . , Z max } and d 6 Z indexed by t S 
{0, . . . , t max } and denned as follows. Introduce the kernel 

W t (A\d) = e-^^, X(6) = - 7[ g + ,„ ■ (B.l) 



Next define recursively as 



= E °S' W ( d ' - d l d ') > = A < Id=o , (B.2) 

d'>d 



where 1^ is the indicator function of the event A. Notice that the sum in the denominator of 
Eq. (|B.1|) is always well-defined. In fact from the definition follows that 0,^1 = if d > Z max . 
Finally, we define the asymptotic degree profile to be 

V 

The following result implies that {A| 7 ^} is in fact the correct asymptotic degree profile. 

Lemma 7 Let {Aj : 1 = 2,..., Z max } be the variable nodes degree profile for a random Tanner 
graph from the (n, A, P, 7) multi-Poisson ensemble. Denote by — u\\ the total variation 
distance between distributions fi and v (see previous Section). If {A.^} is defined as above, 
then there exist A, B > 0, such that 

(I), lim EM = A, (7) , (B.4) 

n— >oo 



(II). lim ||EA-A (7) || =0, (B.5) 

(III) . P 1 1 Ai - A{ 7) I > e} < A e~ Bn£2 , (B.6) 



for any positive e. 



Proof: Notice that (III) obviously implies (J). Moreover (see proof of Corollary 
if a sequence of distributions over the integers // n ) converges pointwise to a (normalized) 
distribution then \\f/f- n > — jU^|| — » 0. Therefore (I) implies (III). 

We are left with the task of proving (III). We shall in fact prove the following stronger 
statement. Consider a multi-Poisson graph generated as in Definition ©. Let fig be the 
fraction of variable nodes i € V such that i G V; (or, equivalently, dj(0) = I) and dj(i) = d. We 
claim that fil g is well approximated by the sequence fi| j defined above. More precisely, there 
exist constants A, B (which may depend on the ensemble parameters as well as on t) such that 

P<[|fig - fig| > e} < ie- Bne2 . (B.7) 

Recall that the degree of a variable node i 6 V; is / — dj(i max ), we have 

Az = E • (B-8) 



The thesis therefore follows from Eq. ()B.7|) together with the observation that the sum ()B.8|) 
contains a finite number of terms. 
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The claim is proved by induction on t. It is obviously true for t = 0. Assume now that 
the claim is true up to stage t, and consider the distribution of f2; j 1 ^ conditional to *]. By 

the induction hypothesis we can assume that [nfj — *j| < e. Furthermore, since fi^j = 
whenever d > / maX5 we can also assume 

|^fig[d] + -^0g[d] + |< e . (B.9) 
d d 

We shall neglect the exponentially rare cases in which these conditions do not hold. 

The total number of variable nodes sampled during stage t + 1 is A to t = Y2k ^ m k^ where 
m$ is a Poisson random variable with mean wyPk/P'(l). We can therefore assume that 
I Atot — nj\ < ne and neglect the rare cases in which this is false. Next, consider a variable i, 
such that di(t) = d. The probability that this is chosen when selecting one of the neighbors of 
a function node a is 

Wi{t) = = [ $± = w(d) . (B.10) 

The probability that, during stage t, this variable node is selected Aj(t) = A times (conditional 
to the total number Atot) is therefore 

P[Ai(i)=A|di(t)=d] = ( A l ot )w(d) A (l-w(d)) A ^ A = (B.ll) 



n A r 

= W t (A\d)+0(l/n) + 0(e). 

Therefore, the fraction of variable nodes such that dj(0) = I, dj(i) = d and Aj(i) = A 
(to be denoted by fi^Zd) is concentrated around [Wt(A|d) + 0(e)]fi[ d . Using the induction 
hypothesis, this implies that 



'{l^2z,d ~ W t (A\d)n^\ >e}< Ae~ nBe " . (B.12) 



Next we notice that di(t) = d and Aj(t) = A implies dj(t + 1) = d — A. Therefore tlf^ 
Sd'>d ^d*-d l d'- Since a finite number of terms enter in this sum, Eq. (|B.12j) implies 



^ d +1) -E^( d/ - d i d 'Kd' 

d'>d 



<e}<Ae~ nBe \ (B.13) 



for some, eventually different, constants A and B. Notice that the sum in the above expression 



is exactly Q^t 1 ^. Therefore, thesis (III) is proved. □ 



C Derivative of the conditional entropy: Poisson en- 
semble 

In this Appendix we compute derivative of the conditional entropy with respect to the interpo- 
lation parameter, cf. Eq. 1)6. 14|) . The crucial observation is the following. Let n a poissonian 
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random variable with parameter A > 0, and / : N — ► K any function on the positive integers, 
then: 

^E/(n)=E[/(n+l)-/(n)]. (C.l) 

Consider now the expression (|6.8f) for the interpolating conditional entropy. This depends 
upon t through the distributions of the m^'s (i.e. the number of right nodes of degree k which 
is a poissonian variable with parameter ntP^/P'il)), and the distribution of the Zj's (i.e. the 
number of repetitions for the variable Xi, which is a random variable with parameter 7 — t). 
When differentiating with respect to t we get therefore a sum of several contributions. For 
the sake of clarity, let us compute in detail one of such terms. In order to single out a term, 
assume that the parameter entering in the distribution of mk is tk and is distict from t (i.e. 
the number of right nodes of degree A; is a poissonian variable with parameter ntkPk / P' (1)) ■ 
Let Z = Z(rrik) be the normalization defined in Eq. (|6.6|) for a graph with mk check nodes of 
degree k. Applying the above formula we get 

^(Mfc) = ^E s log 2 {Z(m k + l)/Z(m k )} . (C.2) 

The symbol E s includes expectation with respect to m^, the choice of m& + 1 check nodes 
of degree k, as well as with respect to the corresponding received message. We can however 
single out expectation with respect to the last of these nodes and use the fact that Z(m k ) does 
not depend on it. Denote Zc(i\ ■ ■ - ih] y) the normalization constant in Eq. (|6.6|) when a factor 
Qc(y\%h (B ■ ■ ■ ® Xi k ) is is multiplied to the probability distribution. Then we have 

^(Mfc) = ^y^%E 8 log 2 {2 c (n...io)/2}. (C.3) 

il-ik 

The same calculation can be repeated for check each degree k as well as for the dependency 
upon t of the distribution of the Zj's. We introduce the notation Z e ^{i] z) for the normalization 
constant in Eq. (|6.6|) when a factor Q e s{z\xi) is multiplied. With these definitions we have 
(here we set aagain tk = t) 



(*) = J2prk)^k E KzE 8 \og 2 {Z c (ii..A k -,y)/Z}-±^E z E s \og 2 {Z oS (i-,z)/Z}- 

k ^ ' h—i k ieV 
-pT^j E Qc(y\0) log 2 Qc(y\0) + E Qeff(^|0) log 2 Qeff(^|0) . (C.4) 



The expression (|6.14|) is recovered by noticing that Zc{i\ ■ ■ -ik', y)/Z = {Qciulzh ©• • ■ ®Xi k ))t 
and Z eS (i;z)/Z = {Q c s(z\xi)). 



D Positivity of R n (i): Poisson ensemble 

In this Appendix we show that, under the hypotheses of Theorem [21 the remainder R n (t) in 
Eq. (|6.16j) is positive. This completes the proof of the Theorem. We start by writing the 
remainder in the form 

R n (t) = R a ,n(t) — Rb,n(t) — R a ,n(t) + Rb,n(t) ; (D-l) 
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where (throughout this Appendix entropies will be measured in nats: this clearly does not 
affect the sign of R n (t)) 

R b , n (t) = -^>E s log( ^[ Z ^\ m \ • (D.3) 

n ^ \Q e ff(^|0) +Q e ff(^|l)/ t 

Analogous definitions are understood for R a>n (t) and Rb, n (t) with the (-)j average being sub- 
stituted by an average over Py x (^1) * * * Pv n (^Vt) ^ ^ the passage from Eq. (|6. 14|) to Ecj. (|6. 15j) . 
The code average E s has to be of course substituted by an average over V variables E„. 

We shall treat each of the four terms R a>n (t) . . . R b , n (t) separately and put everything 
together at the end. Let us start from the first term. Using the definitions IJH.9JI and (|4.MJI we 
get 

R aAt) = pF7^J2 Pk ^k XJ EjEslogfl + tanhJtanh^..,,]. (D.4) 

k h—ik 

Here we did not write explicitly the dependence of the log-likelihood li x .„i h for the sum x% x © 
• • • © Xi k upon the received message (y,y,z) and the code realization. We notice now that 
J and £i x ,..i k are two independent symmetric random variables. We can therefore apply the 
observations of Sec. 0](and in particular Lemma|H]to get 

oo 

R a ,n(t) = X c m,R a ,n(t;m) , (D.5) 

m=l 

where 

-M > 0, (D.6) 



2m — 1 2m 



and 



R a>n (t;m) = -J_Ej[(tanhJ) 2m ] X E s [(tanh£ n ... Jfc ) 2m ] . (D.7) 

k h—i k 

It is now convenient to introduce the 'spin' variables 7 <Ji, i S V as follows 

«={-!!£:!: < M » 

Notice that tanhf^...^ = (cr^ • • • 0i fc )$. We can also write the 2m-th power of tanhi^...^ 
introducing 2m i.i.d. copies a^ 1 ), . . . ,ap m \ Using the notation introduced in Eq. (|3.7|) we get 

(tanh£ il ... ifc ) 2m = <(a« • • • • • • • • • a% m %,* . (D.9) 



We replace this formula in Eq. (|D.7|) . and we are finally able to carry on the sums over i% . . .ik 
and k. The final result is remarkably compact 

Ra,n(t', m) = -^y Ej[(tanh J) 2m ] E s (P(Q 2m )) M , (D.10) 



7 This name comes from the statistical mechanics analogy [14]. 
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where we defined the 'multi-overlaps' 

Q 2m (a«,...,a( 2m )) = I^aJ 1) ....f m) . (D.ll) 



n 

i=l 



Notice that Q2m £ [— 1, !]• 

The same procedure can be repeated for R^ n (t). We get Rb, n (t) = ^2 m c mRb,n(t',m), with 



R bn (t;m) = E u [(tanhu) 2m ] - ^E s [(tanh^) 2m ] = (D.12) 

i 

= E u [(tanhu) 2 -]-VE s [(a i )f m ]= (D.13) 
n 

i 

= E„[(tanhn) 2m ] ^E,^ • • • af m) )t,*\ = (D.14) 

i 

= E u [(tanhn) 2m ]E s (Q 2m ). (D.15) 

Let us now consider R a ^ n {t). Since the probability distribution for the bits x^s is factorized, 
the averages can now be easily computed. We get 

Ra,n(t) = ^-PfcEjE^logll + tanh Jtanh^i • • -tanht; fc ] . (D.16) 
^ ' k 

Notice that in fact the right-hand side is independent both of n and t Once again we observe 

that J and the Vi's are independent symmetric random variables. Using the properties exposed 
in Sec. E|we obtain R a , n (t) = Y,m c mRa,n(t;m), where 

Kn{t;m) = -^-Ej^tanhJ) 2 -]^^ {E.Ktanh^) 2 ™]}^ (D.17) 
^ ' k 

= -L^Ej[(t a nhJ) 2m ]P(q 2m ), (D.18) 



where we defined qi m = E 1 ,[(tanhu) 2m ] G [—1,1]. 

Finally, the same procedure is applied to Rb >n (t). We obtain Rb, n (t) = J2 m c mRb,n(t;m) 
with 

R b , n (t;m) =E u [(tanhu) 2m ]g 2m . (D.19) 

The next step consists in noticing that, because U and V are admissible, we can apply Eq. 
(EU) to get 

E M [(tanhn) 2m ] =E J [(tanhJ) 2m ]^P^{E,[(tanht;) 2m ]}^ 1 = -^y Ej[(tanh J) 2m ] P'(q 2m (p.20) 

This identity allows us to rewrite Eqs. (|D,15|) and (jD.19|) in the form 

Rb,n(t\m) = -pj-r Ej[(tanh J) 2m ]P' (q 2m ) E s (Q 2m ) , (D.21) 
Rb,n{t;m) = — -Mj[(tanhJf m ]P'(q 2m )q2 m . (D.22) 
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All the series obtained are absolutely convergent because c m ~ m -2 as m - > oo and 
n (£;m)| < 1. We can therefore obtain R n {t) by performing the sum in Eq. (jD.ll) term by 
term. We get 

1 oo 

Rn(t) = Y c m Ej[(tanh J) 2m }E t [(f(Q 2m ,q 2m )} t ^} , (D.23) 
^ ' m=l 

where 

f(Q, q) = P(Q) - P'(q)Q - P{q) + P'(q)q . (D.24) 

Since we assumed P(x) to be convex for x £ [—1, 1], /(Q, g) > for any Q,q £ [—1, 1]. This 
completes the proof. 



E Derivative of the conditional entropy: multi-Poisson 
ensemble 

Throughout this Section t* G {0, . . . , t max — 1} and s £ [0, 7] are fixed. Let us start by noticing 
that the expected conditional entropy with respect to the multi-Poisson has the structure (here 
we use the shorthand Y for the received message, which in our formalism is in fact (Y^,Y_)): 

h n = ^E c H n (X\Y) = ^yEu-i^tAu-x^t^-i\uHn{X\y) ■ (E.l) 

Here we denoted by E t2 i tl , with t 2 > t\ the expectation with respect to the rounds t\ + 1, . . . , t 2 
in the code construction, and by the unconditional expectation over the first t\ rounds. 
Notice that parameter s enters uniquely in the state E t u _i, an d more precisely in the mean of 

the Poisson variables {m^} and {k(t*)}. We can therefore apply Eq. (|C1|) to the expression 
(Tm> . We get 

dh n s-^ Pi 



ds 



k { > ii-.ifc v v '' 

(0 

- Y, E» ax _ 1|tt log 2 Z eS (z; z) - E w _!| t , log 2 Z } - 



iGV 



(«) 

P^iy E Ge(v|0) log 2 Qc(y|0) + £ Q eff (^|0) log 2 Q eff (^|0) , (E.2) 



where we used the shorthand U7j = Wi(t*). The definition of the modified partition functions 
Zc{i\ ■ ■ . ifc; y) and -Z e g(i; z) is the same as in App. [0 The resulting expression is here more 
complicated because the expectation over the stages £*+l, . . . , t max — 1 is not independent of the 
graph realization after stage i*. For instance, if an extra check node is added during round (as 
a result of Eq. ({C.ljl ). the following check nodes are going to be added with a slightly different 
distribution in rounds + 1, . . . , £ max — 1- This fact is taken into account by defining the state 

ag f vi ows> the end of round t*, set dj(£* + 1) = dj(£*) — Aj(i*) — — i/j with 
equal to the number of times i appears in {ix, ■ ■ ■ ,ik}- Then proceed as for the interpolating 
ensemble introduced in Sec. for rounds i* + 1, . . . , t max — 1. 
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We now decompose the underbraced terms in (|E.2D as follows: 



+ 



| E {n...i k } 



log 2 Z 



-it 



log 2 Z | 



(E.3) 



F(h...i k ) 



and analogously for terms of type (ii). It is now a matter of simple algebra to obtain Eq. (|7.7|) . 
where (dropping the dependence of ip upon and s) 



A; i\---tk 



W: 



■ W ik F(h ...j k ) F(i) 



(E.4) 



iev 



We are now faced with the task of proving that tp{n) is bounded as claimed in Sec. [7J 
Denote the quantity in parenthesis as 0(n). Notice that |<^(n)| < 2n: in fact (p(n) is the 
difference among the entropies of two ra-bits distributions. 

First, we shall show that (p{n) < C\J (log n) 3 /n, under the hypothesis that Yli d«(t* + 1) > 
An for some positive constant A. Notice that the condition holds with high probability at 
least 1 - 2e~ Bn , for some B > 0. The thesis follows from the inequality (hereafter we set 
4 = di(U + 1)) 



\(p(n)\ < P 



^ dj > An 



CA/ (logn)3 +: 



n 



dj < An 



2n < C 



(log n) 



n 



(E.5) 



We start by two simple observations which hold under the above condition. 



1. There exist a constant Fq > such that \F(ii . . . < Fq. Fq is understood to depend 
on the ensemble parameters as well as on k, but not on n, i* or s. This fact is proved 
by noticing that F(i± . . . ifc) is the difference between the expectation values of log Z in 
two different ensembles. These ensembles can be coupled as and Q m p in App. EI 
Each time a new variable node must be chosen in the two graphs, choose it by coupling 
optimally the corresponding Wi distributions. The number of discrepancies obtained in 
this way is bounded: there is probability 0(l/n) of discrepancy (here the condition on 

dj is used) at each step and less than nA'(l) steps. Finally the variation in logZ 
produced by a single rewiring is smaller than 2 in absolute value. 

2. Let i\, . . . ,ik be i.i.d. with distribution {wi}. There exist a constant wq such that the 
probability that any two of them coincide is smaller than wo/n. This is proved by noticing 
that Wi = [di(i*)]+/ X^Jdi(**)]+ ^ Im&x/An because of the above condition. Therefore, 
the probability of having coinciding indices is smaller than k(k — l)l max /An. 

In a nutshell, these two remarks imply that terms with coincident indices give a contribution 
bounded by C/n in Eq. (|E.4[) . Moreover, the first of these observation implies |0(n) | < C\ 
uniformly in t* and s, as claimed in Sec. 

We next rewrite the function F(i\ ■ ■ - ik) by singling out the stage t* + 1 in the code 
construction 



F(h ...i k ) = E t { ;^ } [E w _ 1 | tt+1 log 2 Z] - E u+1]u pE w _ 1|t , +1 log 2 Z\ = 



E 



{h-ik} 
t r +l\U 



*0'l---im)-E^ +1|t ,*(j 



(E.6) 
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Here *?(•••) denotes the quantity in square brackets in the previous line, and we made explicit 
its dependency upon the variable nodes chosen during stage t* + 1. Notice that j\ . . . j m are 
i.i.d.'s with distributions 

l7 Hi-i k } _ [dj ~ vj\± f _ [dj]+ (v 7 , 

where J/j is (as above) the number of times j appears in {i% . . . ij,}. Notice that wj is not the 
same as Wj, the former being computed in terms of the {dj(i* + 1)} while the latter depends 
upon {dj(i*)}. The only property of . . . j m ) we shall need hereafter is that it concentrates 
exponentially when j\ . . . j m are distributed according to Wj. More precisely 

P [|* - > ns] < Ae~ nBe2 , (E.8) 

for some positive constants A and B. This result is obtained by repeating the proof of Theo- 
rem n for the quantity ^(ji . . .j m ). 

Now, we use the general fact that, given two distributions p(s) and q(s), such that p(s) = 
only if q(s) = 0, we can write 



E q f(s) = E p [X(s) f(s)] , X(s) ^ 44 , z = E, 



p(s) 



(E.9) 



Applying this general relation to Eq. (|E.6[) . we get 

F{h...i k ) = E[X {il _, Ak} (j 1 ...j m );^(j 1 ...j m )] , (E.10) 

where we denoted L = ^Jdj] + and [k] = Yli([di]+ ~ [di — Furthermore, we assumed 

di > 0. We denote by V+ the set of variable nodes satisfying this condition. Notice that 
< [k] < k. Moreover < Xs^ j fe | < C for some constant C (recall that m = O(n)) and 
EXi^^x = 1. In view of the remarks 1. and 2. above, we focus here that ii,...,if. are 
distinct. Under this assumption 

= (1 + 5(n)) X tl ■ ■ ■ X tk , (E.12) 



x - s V-tJ • (E ' 13) 

where \a is the indicator function for the event A and S(n) is a non random function, which 
can be bounded \5(n)\ < Sq/u. Inserting into Eq. HK.10|) . and using observation 1 above, we 
get 

F( H ...i k ) = (l + S(n))E[X il ---X ik ;^]=E[X h ---X ik ;^] + 0(l/n). (E.14) 
We can now plug this result in Eq. I)E.4|) . to get 



^ = E m 1 ---w ik E[X il ---X ik -,^]-Y / ^[Xi^] + 0(l/n) = 

k ii—ifc «GV 

= J^ ) ^{\-nX)-P'{l)X]^}+0{l/n)= (E.15) 
E{/(X,EX)(*-E*)} + 0(l/n). (E.16) 



P'(l) 
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Here f(X,x) is defined as in Eq. (HT24|) and we introduced the site average X = YH=i w i^i- 
Furthermore we used the fact that terms with coincident indices induce an error of order 
0(l/n), and that EX = 1. Since f(X,x) is convex positive with f(x,x) = 0, we have 

\<p(n)\ <CE{(X - EX) 2 - E^)} + 0(l/n) . (E.17) 

Finally, we notice that X satisfies a concentration law of the form 

F[\X - 1| > e] < Ae~ nBe2 . (E.18) 

The proof is, once again, the same as for Theorem^ 

Using the expression (|E.16|) together with Eqs. (|E.8|) . (|E.18|I . and the fact that $<nwe 
finally get 



\0{n)\ < C 2 ne 3 + C 2 nne- nBe2 + 0(l/n) < cJ^^-. (E.19) 



n 



where the last inequality follows by choosing e = a^Jlogn/ 



n. 



F Posit ivity of R n (t*; s): mult i-Poisson ensemble 

We start as for the Poisson ensemble, by writing the remainder in the form 

Rn(t*, s) = Ra, n (t*,s) - R b , n (t*,s) - R a , n (U,s) + R btn (h,s) , (F.l) 
where (to lighten formulae, entropies will be measured in nats in this Appendix) 

- 2^p, {l) 2^^ W ^ ^ i0 g\ Qc (y\0) + Qc(y\D / A ] 

R*,n{U,s) = ^>E<*Wog( n ^7T7 it) • ^ 

\Q eS (z\0) + Q eS (z\l) / 

Analogous expressions hold for R a ,n(t*, s), Rb,n(t*, s) with the conditional measure ( • ) substi- 
tuted by the product measure P Vl (xi) ■ ■ ■ P Vn {x n ). Here we set Wi = Wi(t*) as in the previous 
Section, and we use the notation E^ 1 ' "* fc ^ introduced there. 

The treatment of the four terms R a ,n(-), Rb,n(-), Ra,n(-), Rb,n(') parallels closely the calcula- 
tions in App.[Dl Here we limit ourself to discussing i? an (-): this should be more than sufficient 
for understanding the necessary changes with respect to the Poisson case. As in that case, we 
use (pPjl and ijOjl to write 

Ra,n{U,s) = EpTTf) E EjE^- i ^u; il ---u; ifc log[l + tanhJtanh^ 1 ... i J = (F.4) 

k V h—ik 

= E^ E EjE{^ 1 ...^ fc X {il ... ifc} log[l+tanhJtanhV.iJ}= (F.5) 
= Ep^y E EjE{^ 1 A ifc ...^A ifc log[l + tanhJtanh^ 1 ... 4 J} + 0(l/n). 
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In passing from Eq. (|F.4j) to Eq. (|F.5|) . we used the general identity E^-^LA] = E[X{ h _ ik yA], 
where Xu Utti \ is defined in Eq. (|E.11|) . We then used Eq. (|E.12|) to approximate Xu x ^ ik \ 
with an error of order 0(l/n). 

Now we can Taylor expand the logarithm as in App.[DJ We obtain 

oo 

R a ,n(t*,s) = y~] c m R a>n (U,s;m) , (F.6) 

m=l 

where c m = l/(2m — 1) — l/2m and 

R a>n (t*,s;m) = j^Ej[(t a nhJ) 2m ]E s {P(Q 2m ))* . . (F.7) 

The unique difference with respect to Poisson ensemble is in the definition of 'multi-overlaps'. 
Now in fact we have (with an abuse we use the same notation as for the simple Poisson 
ensemble) : 

Q 2m (a^,. . . ,a^) = £ vHXi*® ■ ■ ■ af m) . (F.8) 

i=l 

Notice that we no longer have Q 2m G [ — 1, +1] because of the terms X^. However Eq. (|E.13|) 
implies \X{\ < exp(m/L). Recall that m is the number of variable nodes chosen during stage 
i* + 1, which is exponentially concentrated around its expectation nj. On the other hand 
L = Yli[di]+ ^ Yli^ii an d the last quantity is exponentially concentrated around n7(i max — 
t*) > nj. Therefore, for any 5 > 0, we have < e(l+e) for any i G [n] with high probability. 
As a consequence \Q2m\ < e(l + e) with high probability. 

The other terms in Eq. ()F.1|) are treated analogously. We finally get 

1 °° 

Rn{U,s) = — - Y,c m Ej[(t a nhJ) 2m ]E t [(f(Q 2m ,q 2m ))}, (F.9) 

^ ' m=l 

with f(X,x) defined as in (|D.24|) . Positivity follows from the assumption that P(x) is convex 
in [-e(l + e),e(l +e)]. 
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